Dickmanns and Wuensche 1999: Digital image processing started almost four decades ago with rectification and preprocessing of images from remote sensing and from medical applications for human interpretation. Processing time usually was in the 1-h time range per image. For a long time, image processing was confined to snapshot images because the data volume of 1 000 to 1 000 000 pixels per image and the relatively low processing speed of computers up to the 1980s did not allow digital real-time video data processing.
   Change detection in images attracted interest as data on the same scene but temporally well apart became available; this was of interest both for medical diagnosis and for military intelligence. Starting from this background, the idea of motion recognition and vehicle control by computer vision came about in the context of planetary rover vehicles. Initial test vehicles had only the camera on board, but the computers required remained in the laboratory; they typically needed a quarter of an hour for image analysis. This was the state in the late 1970s, early 1980s. The vehicles moved a short distance and had to wait, standing still as long as the image was analyzed. Each image was interpreted from scratch.
   At that time, Dickmanns coming from the field of control engineering had the idea of developing machine vision from a different paradigm. Instead of choosing a visually complex environment and working with very low image frequencies (0.01 to 0.1 Hz) as usual at that time in “Artificial Intelligence,” he decided to keep image frequencies rather high (larger than 10 Hz) and visual complexity of the scene to be handled correspondingly low. This low cycle time of 100 or fewer milliseconds (ms) would allow exploiting temporal and visual continuity in the scene because only small changes had to be accommodated. The change of location of the camera observing the scene was to be modeled so that predictions would allow search space for the same features to be reduced drastically.
   Road scenes, especially roads built for high-speed driving, seemed to be sufficiently simple visually for providing the first field of application. The surfaces are smooth and well kept, and the lanes are marked with bright colors for easy recognition by humans; in addition, in order to keep the effect of acceleration levels on human passengers at high speeds low, both horizontal and vertical curvatures are limited in magnitude. The model for this type of road can easily be explained mathematically. Because vehicles are massive bodies that do not jump around but move steadily, their motion can be described by differential equation constraints including the effects of control and perturbation inputs.
   All of this has led to a completely different approach to machine vision than those studied by researchers from the field of artificial intelligence. Recursive estimation based on incomplete measurement data, developed in the 1960s for trajectory determination in aerospace applications, was well established in the engineering disciplines at that time. These methods have been extended to image-sequence processing for full state reconstruction in 3-D space and time at the University of the Bundeswehr Munich (UBM). The method obtained had been dubbed “4-D approach,” in contrast to the 2-D, 2 1/2-D, and 3-D methods under discussion then, disregarding time as the fourth independent variable in the problem domain. The numerical efficiency and compactness in state representation of recursive estimation, which directly allowed control applications for generating behavioral capabilities, finally led to its widespread acceptance in the community dealing with autonomous vehicles (mobile robots) [3, 4, 5, 6, 7, 8].