Custom Object Detector Using Small Training Data

Object detection models take in an image as an input and outputs coordinates to signify the location of objects in the image, along with their detected class. It adds a level of complexity to image classification as there may be multiple objects of the same or different classes in the input image. Other issues might include different orientation of objects, and partial covering of objects. Training models from scratch may take considerable amount of compute power and training data. Fortunately, researchers publish their models with pre-trained weights and then we just needed to adapt them to our dataset.

For this project, I used pre-trained models that are available at the PyTorch Vision model hub for object detection. They are trained and tested on the COCO 2017 dataset which has over 60 classes. I have changed the number of classes to 4 (three objects + background) to match my dataset. We were only given about 1000 training images but that may be plenty given we are just fine tuning.

Another step that I took to “expand” my training data is to use albumentations for data augmentation. Albumentations offer a very easy to use API to structure transforms that can be randomly applied to your images during training. I added random brightness and contrast changes, as well as spatial changes such as scaling, shifting, and rotating the images. Since this is object detection, albumentations also returns the modified bounding boxes whenever transformations are applied.

I have used weights and biases to track the losses while training. The training script that I have adapted uses a combination of 4 losses related to the classification of objects and the regression of the bounding boxes. After each epoch, the model is validated on the test set using precision, recall, and intersection over union using pycocotools.

During live testing, although it was able to detect objects at different orientations and partially covered objects, there is a huge delay in processing. In the future I might try to retrain this model using a smaller model or experiment on model weight quantization.

The trained weights are released along with the code in Github.