It is known bounding box annotations can be prone to noise. Depending on the raw data, bounding boxes can contain more objects inside their boundaries than the classified object. This noise influences the learnings of the algorithm.
In this article, we will examine the impact of noisy bounding box annotations on the quality of object detection and classification for deep neural networks. Studies already exist on the effect on object classifications (Deep Learning is Robust to Massive Label Noise), but little research has been done on localization noise.
First of all let us define what ‘Ground-Truth’ means
‘Ground-truth’ can be easily explained with its roots in geography. When humans started to take satellite images, they needed to know if the picture was identical to the reality on the ground. To verify, they searched for the ‘ground-truth’ by visiting certain places and comparing them to the satellite pictures taken.
In terms of machine learning, ‘ground-truth’ is often seen as the gold standard and refers to the quality training data should meet. Initially, the industry made progress through frequent trial and error to achieve the desired results in machine learning. Today, neural networks are able to learn and improve on their own. This is why quality of training data is absolutely primordial, as it is the only input to an algorithm.
There is no standardization for quality levels of data annotation and in many cases data sets are not accurately portraying ground-truth. There are often errors such as missed or phantom objects, inconsistent labels (pedestrians including and not including carried objects), unclear class boundaries, systematic labeling errors, imprecise boxes (tightness and shifting errors) or badly defined requirements (composite classes).
But how much does it affect machine learning algorithms?
We set up our experiment with the faster_rcnn_resnet101_coco algorithm pretrained on COCO for reproducible results: Tensorflow Object Detection API.
We trained it on three different datasets to show the effects of low-quality training data:
- The original Pascal VOC
- The Pascal VOC with distorted boxes: center shift [-0.08; 0.08] and scale between [0.92 ; 1.08] of initial box size; henceforth called VOC0.08
- The Pascal VOC with distorted boxes: center shift [-0.13; 0.13] and scale between [0.87 ; 1.13] of initial box size; henceforth called VOC0.13
The shifting and scaling factors are sampled from a uniform distribution within the given ranges. Intersection over Union will be abbreviated as IoU and the following are the results:
Keeping IoU at 0.5, we can see a loss of 3% in precision with only a slight perturbation of +- 0.08% and 8% when allowing a higher noise range. When we look at results with a more selective IoU of 0.8, the mAP decreased extremely, rendering the data unusable.
It clearly shows that an improvement in box localization will go a long way in improving performance for otherwise identical networks. This is another important aspect of machine learning on which not enough research is conducted. This experiment is however limited by the IoU and mAP measures. An interesting insight would be whether the decrease in mAP is caused by bad classification, bad localization or both.
Delivering High Quality of Annotated Data
We at understand.ai ensure high standards regarding data quality and therefore have implemented a long pipeline to erase possible errors and inconsistencies in annotations. With our combination of expert labelers facilitated by machine learning algorithms, we offer an accelerated annotation process (algorithm proposal refined by human experts), and quality assurance (every image is validated by a human expert). We are already able to offer annotations several times faster than traditional methods and can thus scale to millions of images.
Marc Mengler, co-founder, understand.ai