Goal-oriented perception and behavior control has to be realized by the autonomous system (a given vehicle or the body of some other agent) in a certain task domain; to be able to do this in an intelligent way the system should have the following capabilities and knowledge components:
- Knowledge about the own sensory capabilities and their limitations.
- Knowledge about the own perceptual capabilities and their limitations; with respect to dynamic vision this includes:
A) Assuming statistical properties of the object motion observed and of the visual mapping process as far as required by the recursive estimation process [usually Extended Kalman-Filtering (EKF)] for
B) Hypothesized 3-D objects from sets of characteristic features, including- object shape (generic spatial models)
- aspect conditions, (relative state)
- motion characteristics (generic dynamic models including speed components)
- Understanding of object motion in 3-D space in the task context (goals of own mission, likely object motion to be expected). Situation assessment taking all objects relevant for own decision making into account.
- Actual behavior decision: Either: Continue the behavioral mode running (feed-forward or feedback control), or: Switch to some other behavioral mode (at the end of a ‘mission element’ or when an event requiring another behavioral mode has been encountered).
- Behavior implementation: In most practically applied systems, control output is done by special processor hardware specific to the actuators implemented; direct state- or output- feedback allows minimizing time delays. That means that continuing a behavioral mode selected previously with direct feedback does not require any communication between the cognitive part of the system (for decision making) and the controller for the actuators; for external monitoring some information may be exchanged.
- Two subsystems have to be controlled in parallel:
- The platform pointing the cameras for visual perception, and
- locomotion of the autonomous vehicle (system) itself.
Independent of closed-loop functioning, specific sets of data may be logged or displayed to an operator for system monitoring.
It is very difficult to explain to a newcomer the detailed functioning of such a complex system for perception and control based on several knowledge base components and on hypothesis generation as well as hypothesis adaptation. Here it is tried to give four different perspective views on the system to outline major considerations that have led to the system design given:
- Scales in space and time: From ‘here and now’ spatial and temporal ranges have to be spanned from micrometers (pixel size) and visual range (~ 100 m) via video cycles (40 or 33⅓ ms) to maneuvers (in the seconds-range) and to mission duration (up to several hours and hundreds of km).
M.3.1 Basic aspects for structuring in space and time - Visual perception proceeds on three levels:
- Isolated image feature extraction without semantic (and temporal) context (level 1, possibly on special hardware).
- Objects and motion processes in the real world (level 2). Models for both shape and motion of single objects from generic classes are used as knowledge background for interpreting sets of features over time. Many of these single-object-interpretation-loops (with feedback of prediction errors) may run in parallel.
- Understanding situations in the mission context, based on background knowledge on object classes and on time histories of object states. Semantic relations between real-world objects and behavioral capabilities in the context of goals for the own mission is used (level 3, no direct use of image data any more).
M.3.2 Structure of visual perception
- Behavioral capabilities of the autonomous system and their activation in connection with the list of mission elements, events encountered, and the situation assessed.
M.3.3 Structure of behavioral decision using skills - A coarse (engineering) block diagram of information flow between major subsystems involved. Both the gaze control and the locomotion control subsystems have to be tuned optimally for good system performance. Fusion of conventional measurement signals and state variables perceived by vision with different delay times requires careful synchronization and tuning for appropriate control outputs.
M.3.4 Coarse block diagram of system integration
M.3.5 Visualization of feedback loops
M.3.6 Abstract feedback loops around ‚Here and Now‘