In-depth look at desiging an image classification pipeline

Link to Github Repo

Image classification i.e. teaching a model to predict the category of an image is one of the most fundamental problems in Computer Vision. In almost every AI technology you come across these days image classification is built in. For example, that's how facebook can suggest you to "tag" a friend in a picture.

Frameworks like PyTorch, Tensorflow and so on make it incredibly easy to train an image classification model. But really, an image classifier is much more than the network you train. Further, there are several steps that affect your model. To give you a more concrete example, say you're training a model to classify a dog v/s cat. And 2 representative images of your dataset are -

Now, you want to use this trained model to predict whether this image is a cat or dog.

Contrary to your belief, chances are high that the model will get confused. The reason is simple - it's never seen a rotated image. We run into such problems very often in slightly more complicated tasks like text recognition i.e. classifying a word letter by letter. Your model is NOT invariant to rotation because it has never seen rotations of an image before and it get's confused at run time. Such problems are rampant, and online tutorials teaching you to fine-tune Imagenet or Inception v3 never go over how to build a classifier which addresses such things specific to your purpose.

The goal of this tutorial is to break down everything that's happening in training these models, and to give you complete control over the pipeline, so that you can build not just a model, but a meaningful and useful pipeline for your specific purpose

OVERVIEW:

BEFORE TRAINING :

  • Making a dataset. Organizing it such that loading can be done.
  • Loading dataset. Trying different ways in which it can be loaded to see what's the difference in the images that go into the classifier. If you want to build invariance to something (like above), this is where you need to get things right.

DURING TRAINING

  • visualizing your input data: It's very important to see what you're feeding to your network. Stare at the input tensor going into your classifier and be sure it makes sense.
  • Visualizing at the gradients being calculated.
  • Visualizing your losses using tensorboard.

AFTER TRAINING

  • Setting up a visualizing pipeline for how your network is doing once trained - visualizing predictions. This is important to see any biases. For example, if you're trying to teach your system to learn the classes plants, dog, humans, cats and so on. It's possible that it just learns everything green is a plant. Because none of the others show up in green color. Now, if you show it this image of a person taken on St.Patrick's day it would think it's a plant!

EVALUATION

  • Evaluating: This goes beyond just classification accuracy
    • Cat to path
    • Cat to Correct preds
    • Cat to Preds
In [ ]: