Obstacle Detection on Road Using Deep Learning

By Sudip Mahajan, Associate Data Scientist at AlgoAnalytics

Introduction

Object detection is one of the most important problems in Computer Vision which involves localizing meaningful objects from an image suitable to a problem use case. Object detection comprises two sub-problems — object recognition and localization and thus it differs from allied problems such as semantic segmentation, instance segmentation, and classification. Localizing objects can be done in various ways including bounding box based methods. Object detection has been an active area throughout the history of Computer Vision, and continues to be so with the modern approaches based upon Deep Learning algorithms. It is helpful for problems such as image annotation, pedestrian detection, face detection and various surveillance objectives.

A wide range of potential activities on the roads could cause disruption in traffic, and could be considered as “obstacles”. These include road construction, bridge construction, building and repair work of monuments, work due to earth-moving machinery etc. Such work should be constantly supervised to avoid accidents and effects on surrounding traffic. It is likely that such activities can lead to unwanted and unexpected obstacles of varying dimensions being left on the road. For example Figures 1 and 2 show traffic cones and signs respectively as obstacles.

Obstacle detection on roads is thus a very critical and useful task from the point of view of road safety and also for helping civic bodies responsible for road safety.

The future belongs to unmanned, autonomous vehicles such as self driving cars and drones, as examples. Powered by technological advancements in the field of AI, such systems are equipped with various sensors for better decision making and maneuvering ability through dense traffic and unfamiliar scenarios. Such sensors including cameras and optical recording devices can aid in detection of objects related to construction activity on the road. They form eyes for concerned governing bodies and give insights into decision making and governance. Autonomous vehicles help in executing projects on massive scales and make large scale sensing possible. With the growth of processing power and availability of sensor data, there is an increasing need for better detection and processing algorithms, to enable execution of such large projects.

Object detection as a computer vision problem can be based upon images as well as videos. Videos are a collection of image frames recorded and played together at a certain frame rate for the end result of continuity. However, the challenges of processing videos are high due to the frames per second (fps) requirement. Video based object detection requires detecting logic to be faster than average video frame rates of today’s technology. There is a growing requirement to make detection in real-time so as to extract maximum benefit from its decisions. Classical image based detection algorithms do not perform at the required speed with videos.

Techniques Used for Obstacle Detection on road

Object detection architectures have seen a continuous improvement to meet real-time processing needs. The backbone networks used for object detection are generally standard networks such as ResNet [1], ResNeXT [2], Xception [3] with one crucial modification — the last fully connected layer is removed to suit detection purposes. Typically object detection performance is measured using mAP (mean Average Precision). The aim in these research topics is to make detection logic faster and as close to real-time as possible to satisfy modern video frame rates.

Some landmark systems related to object detection are R-CNN [4] , Fast R- CNN [5] , Faster R- CNN [6], Yolov3 [7]. R-CNN used the selective search (SS) algorithm to generate region proposals (areas in the image where the object of interest is likely to be present). It created close to 2000 region proposals and had to examine every region proposal for the object of interest causing the performance to be slow. Fast R-CNN network used convolution operation to generate region proposals but it still used SS algorithms causing it to be slow too. Faster R-CNN incorporated a separate network to generate region proposals. However, Yolov3 (a single-stage detector) gives the best performance in terms of frame rate on videos. R-CNN, Fast R-CNN, Faster R-CNN systems were called two stage systems as they perform region proposals and object detection in stages. YOLO type architectures are one stage architectures that perform both functions in a single-stage making them faster than previous attempts. Yolov3 uses ‘darkent’ as the backbone architecture which can be trained with modern GPU machines and cloud spaces. The SSD (Single Shot MultiBox Detector) architecture is another such one-shot detection system.

Recent advances in object detection involve using advanced engines such as AlexNet [8], VGG, GoogLeNet[9], ResNet, DenseNet[10], SENet. Object Detection problem consists of object recognition and localization, it is desirable to have invariance and equi-variance in image representations that can be achieved using feature fusion (integration of shallow and deep features from a CNN model). Some advances on object detection involve looking at the problem as sub-region search, some involve looking at it as key-points localization. Recent advances also suggest that learning semantic segmentation helps in detection performance. An important line of work is making detection logic robust in terms of scale and rotation. Adversarial training using networks such as GANs has also received attention in recent years due to promising results.

Obstacle detection does not come without challenges, one of them being the unavailability of data sets. Standard object detection data sets such as ILSVRC, Pascal VOC, MS COCO are not suitable for this problem. Traffic management objects may become obstacles on the road but detection of traffic signs and lights comes with their own challenges such as illumination changes, motion blur, bad weather, real-time object detection. In general, this problem presents some important directions for exploration:

  1. Lightweight object detection methods are required for speedup and use with mobile devices with memory constraints.
  2. AutoML could reduce human necessity in building detection logic through neural architecture search.
  3. Weakly Supervised Object Detection methods are required as annotation requirements are costly, inefficient.
  4. Detecting small objects in large scenes has long been a challenge.

Because of such challenges, the problem is a formidable one! We at AlgoAnalytics are using a variety of techniques for solving object detection problems with real-time performance needs. A simple obstacle detection project was achieved using traffic cones and traffic signs data sets freely available online. Yolo v3 with densenet as the backbone architecture was used for this purpose. Figures 1 and 2 above show sample results on the data set used for the task which encountered challenges mentioned above too.

References

[1] He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[2]https://towardsdatascience.com/review-resnext-1st-runner-up-of-ilsvrc-2016-image-classification-15d7f17b42ac

[3] Chollet, François. “Xception: Deep learning with depth wise separable convolutions.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[4] Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2Include a high-quality image in your story to make it more inviting to readers.

[5] Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE international conference on computer vision. 2015.

[6] Ren, Shaoqing, et al. “Faster r-cnn: Towards real-time object detection with region proposal networks.” IEEE transactions on pattern analysis and machine intelligence 39.6 (2016): 1137–1149.

[7] Redmon, Joseph, and Ali Farhadi. “Yolov3: An incremental improvement.” arXiv preprint arXiv:1804.02767 (2018).

[8] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Communications of the ACM 60.6 (2017): 84–90.

[9] Szegedy, Christian, et al. “Going deeper with convolutions.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

[10] Huang, Gao, et al. “Densely connected convolutional networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

This post first appeared in Medium