Fighting overfitting with dropout

Deep learning aims to build models that perform well on unseen inputs. By unseen inputs, we mean inputs that were not available when the model was trained. When our model performs well on unseen inputs, we say that it is able to generalize.

As you know, I’m a huge fan of the practical methodology proposed by Goodfellow et al. (2016) for the development of deep learning models. For those who still don’t know this methodology, take a look at Figure 1.

Figure 1. Practical methodology to build deep learning models (based on Goodfellow et al. 2016)

If you follow this methodology, you’ll have to instrument your system to determine its performance. Being able to understand if your model is underperforming due to underfitting or overfitting is key in deep learning.

As we discussed in the last post, you can instrument the training and validation set performance using learning curves. The gap between the training and validation set performance will tell you if your model is underfitting or overfitting, as shown in Figure 2.

Figure 2. Underfitting and overfitting concepts (source).

Deep learning is based on deep neural networks and these networks are prone to overfit (Srivastava et al. 2014). Accordingly, this blog post will focus on a technique for overfitting reduction named ‘dropout’.

 Dropout

Dropout is a popular regularization technique proposed by Srivastava et al. (2014). The idea behind dropout is to ensemble a set of sub-models that were generated by randomly eliminating neurons from the base model. Thus, model predictions result from the ensemble and not from the base model.

Figure 3 illustrates the dropout concept.

Figure 3. Dropout concept (adapted from Goodfellow et al. 2016).

On the left side, we have a base model. This model is based on a typical neural network. In contrast, on the right side, we have a set of sub-models that are variations of the base model. You get these sub-models by randomly removing neurons from the original neural network. Model prediction results from the ensemble of sub-models.

Dropout application with Keras

Dropout is easy to implement and compatible with several models and training algorithms. If you’re using Keras, just need to add a line of code to apply dropout. Yes, it’s that easy.

In the following example, you’ll see how to use dropout and its impact on a practical case. As an example, we will use the dataset provided by Kaggle for the Dogs vs. Cats competition.

References

  • Goodfellow, I., Bengio, Y., Courville, A. and Bengio, Y., 2016. Deep learning (Vol. 1). Cambridge: MIT press.
  • Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R., 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research15(1), pp.1929-1958.

Leave a Reply

Your email address will not be published. Required fields are marked *