Blog

7 Steps to Learning

7 Steps to Learning

In this blog post, I'll share with you my learning process so that you can learn anything faster.  Today, the ability to constantly learn and reinvent ourselves is essential. For many years, we lived in a life model that had two moments: a period of learning and a period of working. We lived these moments in a sequential way. We would spend some time learning some skill and then, in the second moment of life, we would use those skills to earn a living. Well, those days are over. The social acceleration is killing this two-moments model and now we need to learn how to learn if we want to stay relevant.  In this blog post, I’ll share with you how I learn. It’s a seven steps process that I’ve been improving over…
Read More
Fighting overfitting with dropout

Fighting overfitting with dropout

Deep learning aims to build models that perform well on unseen inputs. By unseen inputs, we mean inputs that were not available when the model was trained. When our model performs well on unseen inputs, we say that it is able to generalize. As you know, I’m a huge fan of the practical methodology proposed by Goodfellow et al. (2016) for the development of deep learning models. For those who still don’t know this methodology, take a look at Figure 1. [caption id="attachment_655" align="aligncenter" width="902"] Figure 1. Practical methodology to build deep learning models (based on Goodfellow et al. 2016)[/caption] If you follow this methodology, you’ll have to instrument your system to determine its performance. Being able to understand if your model is underperforming due to underfitting or overfitting is key…
Read More
How To Read Learning Curves And Why Do We Need Them In Deep Learning

How To Read Learning Curves And Why Do We Need Them In Deep Learning

Imagine that you’re developing a deep learning model. You’re in the prototyping stage and, as good a data scientist, you’re following the practical design process proposed by Goodfellow. You’re totally focused on having a working end-to-end pipeline as soon as you can. In your dungeon, the sound of the keyboard breaks the silence. You keep hitting it violently, key after key, looking for the secret code. It’s dark outside. The night goes on and the smell of coffee denounces the sin of sleepiness. But you did it. You run the code using Shift-Enter and your first deep learning model comes to life. You’re now a creator. As you lie down on your chair, another coffee comes to you. It’s now time to see how well your creation is performing. Following…
Read More
How To Do Data Augmentation With Keras

How To Do Data Augmentation With Keras

Last week, I had to solve a machine learning problem that may happen to you. While doing a consultancy job for a company, I had to build an algorithm that could identify a specific product in a pictures’ database. In theory, that’s not a big issue. There are several computer vision approaches that we can use - even APIs  - so I thought that the problem was easy to approach. However, in this case, the client wanted to build their own computer vision solution. And they wanted a solution based on deep learning. ‘As you wish, master’. I reached an agreement with the client and asked access to its database, in order to build the simplest deep learning model I could think about. As you know, your first priority in…
Read More
Missing Data: How To Handle It – Properly!

Missing Data: How To Handle It – Properly!

This post is about missing data and it's part of the Data Cleaning series that we started here.  To teach you some good practices on how to deal with missing data, I solved a practical problem. In the solution of the problem, you'll find: The correct procedure to deal with missing data. A warning about possible data leakage situations. Details on how to build pipelines and why you need them. You can also access this example on GitHub. https://gist.github.com/pmarcelino/4b54bc02ee3f9b01036339f508b22fbb    
Read More
Doing Data Cleaning Like The Pros

Doing Data Cleaning Like The Pros

In a previous post, we introduced the data cleaning topic. We defined data cleaning as the 'process of transforming raw data and make it sufficiently standard to be analyzed’. We also said that there are three types of problems in data cleaning: general, missing data, and outliers. This post focus on general data cleaning problems. These problems can be: Data types. Data standardization. Constant features. Duplicated rows. Duplicated features. Values out of range. Shuffle dataset. In the following sections, we will see examples of each of these problems, as well as ways to work around them. All the examples are accessible on GitHub. If you want to know more about data cleaning, you can also study one of these books: Python Data Analysis or Machine Learning with Python Cookbook. Data types…
Read More
What is Data Cleaning and How Does it Work?

What is Data Cleaning and How Does it Work?

Another morning seated in front of the computer. The clean desk and the minimalist style that decors my dining room hides a secret: the messiness of my dataset. After several hours of unfair struggle against an emotionless piece of code which refuses to obey me, I finally manage to break it and bend to my will. The data cleaning process is finished and the dataset is ready for analysis. The sun starts peeking out the window reminding me that it’s time to go. Alexa’s sexy voice confirms it. It’s 6:30 AM and I have to leave my la la land and go to work. Data cleaning is a process Data cleaning is the process of transforming raw data and make it sufficiently standard to be analyzed. This modification process aims…
Read More
Learn How To Apply Sequential Feature Selection Using Mlxtend

Learn How To Apply Sequential Feature Selection Using Mlxtend

In the last posts, we have been talking about feature selection. We started by exploring univariate feature selection and then we moved to model-based approaches for feature selection. Today we explore a different type of feature selection technique: sequential feature selection. Sequential feature selection learns a subset of relevant features by sequentially adding (or removing) features according to the performance of the prediction model. The application of sequential feature selection is almost like wearing warm clothes in Canada. When you go to the street you add all the layers of cloth you can, when you enter a building you remove all the layers of cloth you can. [caption id="attachment_561" align="aligncenter" width="168"] Going out in Canada.[/caption] The most common types of sequential feature selection are forward feature selection and backward sequential…
Read More
How to do feature selection via a model-based approach

How to do feature selection via a model-based approach

As we saw in a previous post, feature selection attempts to remove unnecessary features from the prediction model. When we reduce the number of irrelevant/redundant features, we hope to improve the accuracy of the model and to reduce its computational cost. In this post, we learn how to select features through a model-based approach. This type of approach uses machine learning to model the data, judging the usefulness of a feature according to its relative importance to the predictability of the target variable. To perform model-based selection in scikit-learn, you can use the meta-transformer SelectFromModel in conjunction with different models (e.g. L1 penalized regression models or tree-based estimators). The following notebook gives you a practical example of the application of feature selection through a model-based approach. You can access all the data and the…
Read More
The Secrets of Univariate Feature Selection

The Secrets of Univariate Feature Selection

Univariate feature selection is a powerful technique to improve the performance of your models and to reduce their computational cost. This feature selection technique uses statistical tests to assess the relationship between each input feature and the output feature. Input features with a strong statistical relationship with the output feature are kept. The remaining features are excluded. [caption id="attachment_548" align="aligncenter" width="900"] Source: https://www.pinterest.pt/pin/569846159077398163/[/caption]  In the following notebook, you'll learn how to use univariate feature selection. Explore this notebook, do your own variations, and get better at feature selection. You can access all the data and the GitHub version here. Notebook on Univariate Feature Selection https://gist.github.com/pmarcelino/2e4ccd0da0941950cca00bf06b75b396
Read More