In order to obtain a real-time vision system that could be realized in the early 1980’s on existing computer hardware, special emphasis had to be laid on the methods used in the visual perception- and the vision-based control systems. Control systems, in general, are tuned to real-time applications since they are intended to immediately correct errors appearing due to perturbations in the process to be controlled.
Aero-Space technology around 1960 had given rise to a recursive re-formulation of Gauss’s ‘least sum of errors squared’-method for dealing with noisy and incomplete measurement data. Instead of using a known generic solution (curve) with batch processing of complete sets of measurement data to find the best fitting set of parameters for the solution curve, one tried to find a generic dynamic model with corresponding state variables that would lead to the solution curve of batch processing [with known (or estimated) noise statistics].
The decisive step taken in our approach (contrary to Kalman filtering in the image plane for smoothing results, as was common in the vision community) was to use valid dynamic models of the physical process visually observed. This is knowledge representation for the motion process to be perceived. Perspective mapping then is a nonlinear measurement model for which a local linear approximation has to be found and continuously updated. If the dynamic model is of second order (as Newton’s law requires for massive bodies), speed components quite naturally become state variables which are reconstructed in the recursive estimation process.
Under these conditions, the recursive vision process yields all those variables that are necessary for optimal linear (state-) feedback. Two methods were available for recursive estimation at that time: Kalman filter [Kalman 1960] and Luenberger observer [Luenberger 1964]. Since the latter one is simpler and sufficient for well-behaved processes, we started out with the Luenberger observer [Meissner 1982].
In this section, a survey is given on the methods developed under the side constraint that one vision cycle should not last longer than ~ 0.1 second. With computing power per microprocessor increasing by a factor of 10 about every 4 to 5 years, each generation of Ph.D.-students could start anew on the same topic and increase complexity of shape and dynamic models used; reaching full video rate was well in reach. [Note that this approach emphasizing real-time operation is in contrast to the approach selected almost everywhere else at that time: There, image processing was allowed to take as much time as needed with the methods selected; real-time performance was expected to be achieved with future processor generations. These different side constraints have led to different methods and solutions preferred.]
The methods discussed here are grouped according to visual perception and control of action in a mission context. Special emphasis is laid on functional system integration; the overall systems point of view is considered essential for achieving high performance with moderate investments in components. With a background in rather complex aero-space control systems, the author had relatively easy access to a way of thinking about cognitive systems that may not be available to a newcomer in the field of machine vision (note, not ‘computer vision’!). Consequently, the terms and methods used are taken from the long established fields of engineering; international standards are available for creating a homogeneous terminology around the globe (ISO-standards). Adoption of this established terminology by CS and AI would be appreciated.