Self-driving vehicles hold great promise and therefore have become a very active area of research in recent years. Although tremendous progress has been made, in particular through new powerful computer vision techniques involving convolutional neural networks (CNNs), widespread adoption of fully autonomous vehicles is likely still at least a few years away. A more immediate application of related methods are advanced assistance systems that work in conjunction with human drivers, for example, by warning them about difficult-to-spot obstacles. Especially under challenging driving conditions, like during the night or in bad weather, such support technologies could make a valuable contribution towards safer traffic.This thesis explores, implements and evaluates techniques that could form the basis of a night vision system for an intelligent car. More concretely, we study the problem of real-time two-dimensional multiple object tracking in videos, recorded by an in-vehicle camera, as a part of the CarVisionLight (CVL) project, while driving in rural areas at night. We assess how well state-of-the-art daytime models generalize to a nighttime setting. We study how to optimally utilize big existing driving datasets for training a CNN based object detection model that is particularly effective on the considerably differing CVL data. Then, we extend this detection model to a full tracking method while also applying several enhancements aimed at improving the performance in our specific setting. Eventually, we put all of our insights together and develop a prototype for an end-to-end tracking tool. Among the most significant findings of our various investigations are that (1) established general purpose daytime models are not suited for CVL data in our experiments, (2) training with only 25% of the images in the Berkeley Deep Drive dataset yields already 0.48 validation mAP (as opposed to 0.50 when training with the full dataset), (3) a model trained primarily on night data can also perform well during the day and (4) finetuning on higher resolution images leads to a mAP improvement of 0.07 at inference time. Further, our domain specific tracker adaptions provide a noticeable increase in tracking consistency (3 times less ID-switches) and recall (14% higher). Lastly, an extensive quantitative and qualitative evaluation shows that our tracking tool poses an effective solution to the problem at hand, both in terms of tracking/detection quality as well as in terms of execution speed, and has potential to stimulate future work.