Wuensche 1986b : An approach is presented that combines dynamical models of 3D motion with geometric models of the scene and the laws of perspective projection to estimate all motion parameters necessary to control a mobile robot vehicle. The approach is demonstrated by autonomous control of a jet propelled air-cushion vehicle, navigating through a technical environment with three degrees of motion freedom and performing a rendezvous maneuver with a passive partner. Features of the partner and other objects in the scene, the 3D shapes of which are known, are looked for and then tracked by the processors of a multi-microprocessor system. A sequential Kalman filter formulation is used to detect and to cope with variable feature visibility due to occlusion and motion while determining the complete relative motion state without inversion of the projection equations. A scheme is developed for always selecting those features for tracking which yield the best state estimate, the quality of which is demonstrated by physical docking with a static partner. The system operates at 0.13 seconds cycle time, half of which is spent for I/O operations. Experimental results are given.

Visual detection of motion and control of locomotion are two apparently different tasks being, however, tightly connected in biological evolution. Most living beings able of controlled locomotion have vision systems as the main source for motion information, although many of them do not have pictorial vision. Apparently, the latter is not necessary for motion detection. This is confirmed by psychological research like the work of Johannson [1,2], showing that humans are able to detect and understand complex motions like those of a walking person even if only a few features of these persons are tracked: watching bright spots attached to the main joints of persons walking in the dark is sufficient. This suggests that good models of typical motion behavior exist in the human knowledge base, which allow to successfully interpret image sequences even if very few information is contained in single images of this sequence.
   Several robot vision systems have been investigated by numerous authors in the past, most of which try to make use of the pictorial information contained in vision. This leads to the computationally expensive effort to solve the generally nonlinear projection equations, at least approximately, for the sought relative position values [3], or to recover 3D motion directly from 2D optical flow [4,5]. If the task is not simplified by considering only special markings in the scene [6,7] slow overall performance is achieved which in turn poses additional correspondence problems not present in continuous dynamic scenes.
   This “inversion problem” is avoided in the presented approach. It combines dynamical models as used in modern control theory for describing the motions of the vehicle and/or other objects in the scene with geometric object models and the laws of perspective projection to determine all relative motion parameters necessary to control robot motion. This approach is outlined together with the hardware concept used in the next section. Then the air-cushion vehicle is introduced, which allows some interesting aspects of vision to be examined because it is capable of independent motion in three degrees of freedom. The rendezvous task is outlined thereafter. It was chosen not only because the air-cushion vehicle may be viewed as a satellite model with the development of vision-based satellite rendezvous maneuvers being the objective of this research, but also because rendezvous maneuvers constitute a high precision navigation task requiring exact motion estimation and thus are of general interest. After the motion estimation is detailed in the main body of the paper, a feature selection scheme is presented for always selecting those features for tracking that provide the best motion estimate possible while features appear and disappear due to motion and occlusions in the scene. Results are presented, finally, that show the approach to yield reasonable motion estimates even if only one feature is visible for many cycles.