Deliverable 2 - Image Processing and Deep Learning Environment

Summary

Set up an environment with image processing and deep learning functionalities. It should be sufficiently sequestered from the main machine while also support easy integration with the GPU and CUDA framework. It should not install (esp not as root) anything to the host machine directly.

This environment would contain the deep learning framework Pytorch and the open CV image processing and manipulation framework. Pytorch has been simple to set up with a virtual environment. But open CV proved to wreak havoc on the host machine's environment by overriding install paths and installing other folders in unexpected places.

A solution was needed to set up these dependencies for object detection that could support all the functionalities of both frameworks, but leave the host machine untouched.

The solution is to use Docker to contain the dependencies.

cute whale carrying container

Reasons for Docker

Docker containerizes things. This means OS-level virtualization of isolated applications and services running on a single host instance sharing the kernel. This is the main difference between Docker and a virtual machine, which runs a separate OS and kernel on top of a Hypervisor.

By containing the application, the host machine does not have global configurations, install paths, or other settings hijacked by an application. Applications and services are free to do what they need in their containers, and the host is unaffected.

The following are some key advantages of Docker.

Setup

Here are instructions on how to set up Pytorch

Dependencies

Inside the Environment - Pytorch and Open CV

Pytorch

Pytorch was covered in a previous deliverable that classified 0-4 and 5-9 digits.

Open CV

Some Open CV functionalities found are covered in this collection of Open CV slides such as the following image transformations.

There is also some logic open CV supports to convert videos into a series of frame-by-frame images. If there is camera recording input, this can be converted to a series of images, then fed into the neural net. Video is the next level of training input for ML models, and Facebook has been a forerunner in that respect with its massive data stores of video content. It makes sense to have this functionality handy, especially for testing.

References