Last week, I had to solve a machine learning problem that may happen to you. While doing a consultancy job for a company, I had to build an algorithm that could identify a specific product in a pictures’ database.
In theory, that’s not a big issue. There are several computer vision approaches that we can use – even APIs – so I thought that the problem was easy to approach. However, in this case, the client wanted to build their own computer vision solution. And they wanted a solution based on deep learning.
‘As you wish, master’. I reached an agreement with the client and asked access to its database, in order to build the simplest deep learning model I could think about. As you know, your first priority in a data science project is to build an end-to-end pipeline that you can use as a baseline.
When I got access to the database, I was impressed: there were no more than a few hundred images. ‘I’m screwed’. You may have already read that deep learning models are as cool as demanding in terms of data. Less than tens of thousands of images and your model will probably not generalize well and overfit.
What should we do? Well, one option is to request more data. However, that’s something that your client probably doesn’t want because it means costs. Remember, every data collection process has a cost. Accordingly, I would advise you to keep this card in your hand until no other option remains.
Ok, so what else can we do? Since we are dealing with a computer vision problem, one thing that we can do is to reduce overfitting through data augmentation.
Data augmentation as a way to generate more data
Data augmentation allows you to increase your dataset using data that you already have. In the context of our example, it means that we will use the images that we already have to generate new images. Yes, free data my friend. That’s what we are talking about.
How do we do that? Well, in theory, all we need to do is to make minor alterations to the pictures that we already have, such as flips, translations, rotations, scaling, and others. ‘That’s cool Pedro, but how do we do that in practice?’. Good question! Let me answer you with Keras.
Doing data augmentation with Keras
Keras is a high-level deep learning library written in Python. In this example, I’ll show you how to do data augmentation with Keras. Generally speaking, what we need to do is:
- Create an object with the set of alterations you want to do to the original picture.
- Load the picture.
- Preprocess the picture.
- Apply the transformations defined in the object.
Let’s see how do we apply this to Keras, assuming that our dataset is composed only has one picture (the most beautiful picture in the world) and that we want to generate three pictures.
Do you understand the code? Let’s read its main elements line-by-line.
- It just loads the image.
- We define the transformation that we want to apply to the figure. In this case, we did rotations, shifts, flips, and so on.
- Converts the image to a Numpy array
- Adjusts the shape of the array so that we can feed it into flow.
- for batch in datagen.flow(…)
- It takes data arrays and generates augmented data, according to our specifications. Here we just have one picture, but flow is prepared to work with batches.
As we can see, this code produces a set of modified images that we can use to train our deep learning model. These images enlarge our dataset and reduce overfitting.
Now that you got a flavor of deep learning models for computer vision, you can start digging deeper. Take a look at Keras’ GitHub and practice your skills with the Kaggle competition Dogs vs. Cats. Soon you’ll master the noble art of deep learning.