Soatto 1996 : This thesis explores the problem of inferring information about the three-dimensional world from its projections onto a camera (images). Among all visual cues, we do not address "pictorial" ones, such as texture or shading. Instead, we concentrate on "dynamic" cues, which are associated with variations of the image over time. In order to eliminate pictorial cues, one may represent the world as a collection of geometric primitives, such as points, curves or surfaces in three-dimensional space. Then, from the two-dimensional motion of the projection of such primitives onto the image, one can infer the three-dimensional structure of the world and its motion relative to the viewer. "Three-dimensional structure from two-dimensional images" has now been a central theme in Computer Vision for over two decades, and tools from Linear Algebra and Projective Geometry have been widely employed to attack the problem as a "static" task. It is only in recent years that the role of time has started to be recognized, after the influential work of Dickmanns and his coworkers on vehicle guidance on freeways. We do not impose restrictions on the structure of the environment, and we cast the problem of general three-dimensional structure and motion estimation within the framework of Dynamical Systems. We show how different algebraic constraints on the image projections can be interpreted as nonlinear and implicit dynamical models whose (unknown) parameters live in peculiar differentiable manifolds that encode three-dimensional information. Recovering such three-dimensional information then amounts to identifying dynamical models while taking into account the geometry of the parameter manifolds.