Markus Solbach


Bachelor Thesis "Automatic Initialization of Model-Based 3D Tracking"

Abstract Augmented Reality (AR) is getting more and more popular. To augment information into the field of vision of the user using HMDs, e.g. front shields of a car, glasses, displays of a smartphone or tablets are the main use of AR technology. It is necessary to get the position and orientation (pose) of the camera in space to augment correctly. Nowadays, this is solved with artificial markers. These known markers are placed in the room and the system is taught to this set up. The next step is to get rid of these artificial markers. If we are calculating the pose without such markers we are talking about marker-less tracking. Instead of artificial markers we will use natural objects in the real world as reference points to calculate the pose. Thus, this approach can be used flexibly and dynamically. We are no longer dependent on artificial markers but we need much more knowledge about the scenery to find the pose. This is compensated by technical actions and/or the user himself. However, both solutions are neither comfortable nor efficient for the usage of such a system. This is why marker-less 3D tracking is still a big field of research. This sets the starting point for the bachelor thesis. In this thesis an approach is proposed that needs only a quantity of 2D Feature from a given camera image and a quantity of 3D Feature of an object to find the initial Pose. With this approach, we got rid of the technical and user assistance. 2D and 3D Features can be detected in any way you like. The main idea of this approach is to build six correspondences between these quantities. With those we are able to estimate the pose. Each 3D Feature is mapped with the estimated pose onto image coordinates, whereby the estimated pose can be evaluated. Each distance is measured between the mapped 3D Feature and the associated 2D Feature. Each correspondency is evaluated and the results are summed up to evaluate the whole pose. The lower this summed up value is, the better the pose. It has been shown to have a correct pose with a value around ten pixels. Due to lots of possibilities to build six correspondences between the quantities, it is necessary to optimize the building process. For the optimization we will use a genetic algorithm. During the test case the system worked quite reliably. The hit rate was around 90% with a runtime of approximately twelve minutes. Without optimization it can take easily some years.

The thesis and the slides can be downloaded under the following links (german):


For further information please contact me: