The annotation tool is built. It supports YOLO or VOC format. Annotations moving forward can be done with either format, but should stay consistent.
The much bigger challenge is how to scale image annotation. There simply is not enough time for research and manual annotation. It is good to have this tool available, but it is quite apparent that research can only be accomplished if outside resources support image annotation. There are whole teams, such as the Microsoft one behind the COCO dataset, that have devoted themselves to building such datasets. It is an unfortunate reality that a single researcher simply does not have the time or capacity to do the same.
Fortunately, there are researchers willing to share their data. Two data sets have been identified (the PKLot one and the Hsieh one). Refer to the references section below for links to them.
For future work, leverage existing parking lot data complete with bounding box annotations. There are researchers who have gone ahead to do the hard work. Note their efforts in the reference section of the master's project.
Annotation of an image requires accurately identifying which objects are in each image as well as drawing bounding boxes around each one. The ground truth is saved to a text or markup file. Careful attention must be paid to maintaining the relationship between the output ground truth and the image annotated.
The tool is forked from great work by TzuTa Lin. The binaries were not altogether difficult to build. The Makefile has been updated with custom commands to construct the executable on MacOS, since there was only Windows and Linux support to begin with.
These are the following annotation tool dependencies that are used for building the executable.
Below are python-specific package requirements.
The output is an executable that runs a desktop GUI. It is straightforward to use. Simply load an image, draw bounding boxes around objects, label the objects, and save the output with VOC or YOLO format. VOC format saves to an XML file, while YOLO saves to a text file.
Despite being simplistic, the annotation effort simply requires too much time. For VOC, most the images contain a few objects. A parking lot can contain dozens. At a rate of six to seven minutes of annotation per image (a tight estimate), it can take up to twelve hours to annotate a hundred images. Spreading the effort across multiple individuals makes this more scalable, but additional upkeep is necessary to preserve ground truth accuracy and data quality.
Connected with Hsieh about getting annotated parking lot images. There are a few thousand parking lot frames to train on. They are annotated, but the conversion will need to be made into whatever format YOLO takes. His work focused on image detection with drones. It is a promising stepping stone towards working on a fixed camera solution to object detection.
There is also the PKLot dataset that can provide more training data. It will require some work to port the annotation format over to VOC or YOLO. It is apparently non-trivial to set up training for, but perhaps the effort would pay off with the additional training examples.
The annotation tool is handy to have around, but it is best used in the context of a team of researchers and "annotationists" dedicated to generating ground truth parking lot images. For a single researcher to annotate even a hundred images is arduous and not scalable.
There are thousands of annotated parking lot data to leverage. Once a neural network is set up for training, validation, and testing, efforts must be made to incorporate these images into training.
Perhaps the final 100 or so images manually annotated by the researcher can be used for testing.