3 Skills to Become a Successful Data Scientist (and How to Learn Them!)

In 2012, the Harvard Business Review declared Data Scientist as the sexiest job of the 21st century. Although more and more people are joining the crowd and becoming sexy, the truth is that the demand keeps rising, as shown by a recent LinkedIn Economic Graph (Figure 1).

Top Emerging Jobs in the US
Figure 1. Top emerging jobs in the US in 2017. (source: LinkedIn)

At this point, you’re saying to yourself: ‘I also want to be sexy’. But you probably don’t know how because Data Science is such a multidisciplinary field, that it’s hard to identify which skills you should develop. So the question arises: ‘how can I become sexy?’

Easy: just smile. You should smile because science already has shown that it increases your attractiveness. But you should also smile because this is the right post to you. Here, I’ll tell you what skills you need to develop and how you can learn them.

In general, there are three types of skills that you need to master:

  1. Programming
  2. Maths
  3. Communication

Programming

Although more and more solutions are being released to democratize Data Science and liberate people from the need to know how to program, I still see programming as an essential skill for any Data Scientist.

There are many reasons for this. First, today it’s easy to learn how to program. There are a lot of great learning resources out there, most of them for free. Moreover, it is now known that old dogs learn new tricks, as our brain keeps changing its structure and connectivity. So, no excuses!

Second, if you know how to program, you’ll able to tune your data analysis. Having the flexibility to manipulate the data as you wish, tweaking the algorithms, and exploring your assumptions in a totally personalized way is priceless.

Finally, by learning how to program, you’re learning how to think. That’s what Steve Jobs said and I couldn’t agree more. If Steve’s and Pedro’s opinion is not enough to convince you, read the Seymour Papert’s article on ‘Teaching Children Thinking’. 

So, there are the two basic programming skills that I’d recommend you to develop:

Python

  • Automate the Boring Stuff with Python [website] – Practical programming for total beginners. This book will teach you the basics of programming, using Python to write programs that will automate boring tasks. Awesome free learning resource.
  • The Hard Way is Easier [webpage] – General tips on how to learn programming languages. I found them motivating, especially the one about ‘Do Not Copy-Paste’.
  • Hacker Rank [website] – One way to improve your programming skills is to solve programming challenges. In Hacker Rank, you’ll find specific challenges for Python and Computer Science, which will allow you to develop your programming skills.
  • Code Review [website] – Here you can post your code and ask for peer programmer code reviews. Feedback is always a great way to learn and improve our skills.
  • Elements of Programming Interviews in Python [book] – Programming interviews are designed to test your skills. By practicing some of the most common questions, you’re developing the basic skills of a good programmer.

SQL

  • OpenClassrooms MySQL (French) [online course] – This is a very complete course on MySQL, which taught me almost everything I know about MySQL. The course is in French, which is a handicap to the 7 168 000 000 people that don’t speak the language of René Descartes.
  • Mode SQL Tutorial [online course] – A viable alternative (or complement) to the course provided by OpenClassrooms. This course teaches you how to use SQL for data analysis. It has different difficulty levels, providing you a structured and progressive learning experience.

Maths

‘To Know or Not To Know: A Moral Reflection on Mathematics for Machine Learning’. This could be the title of a book, but it isn’t. It’s just the story of our life as aspiring Data Scientists, in a world dominated by Machine Learning.

Nowadays, Machine Learning plays a central role in the life of a Data Scientist. Due to its capabilities, Machine Learning is one of the most powerful tools that a Data Scientist can have in his toolbox. Once we acknowledge that, a common question that comes to our minds is: ‘what’s the amount and level of maths needed to understand Machine Learning?’

This question has a fuzzy answer:

  1. No, you don’t need much to start doing Machine Learning and apply it to Data Science.
  2. Yes, you need much to start mastering Machine Learning and apply it to Data Science.

To a certain extent, it’s like driving. Do you need to know Mechanics to drive a car? No. But do you need to know Mechanics to know how to fix the car? Absolutely. The main difference is that cars are equipped with alert systems that will warn you if something goes wrong. Homemade Machine Learning applications aren’t. That’s a problem.

So, what should you do? If you’re a beginner, I think the best thing you can do is to start driving your Machine Learning vehicle without a driving license. But do it only in your backyard. Take some data, play around with a Machine Learning library, and do some tricks with it.

Later, as your interest in the subject grows, you can dive into the mathematics. By then, you’ll probably have a different perspective on the problems you’re solving and what you need to know to understand them better. This will guide your studies and make them more effective.

The main topics you need to know to master Machine Learning, as well as all the other tricks that a Data Science Jedi should know how to do, are:

  • Multivariable Calculus
  • Linear Algebra
  • Statistics and Probability
  • Machine Learning
  • Optimization

Since it’s not easy to enter into this world, I designed two learning paths for you:

‘Sweet’ learning path

There are two types of people in the world: those who start learning about any topic on Wikipedia, and those who start on Khan Academy. I belong to the second type.

This learning path is based on Khan Academy. Despite its oversimplification, Khan is a great resource to learn the basics. Although this learning path will not make you master the craft, it will certainly provide you a broad idea of the mathematical foundations of Data Science and Machine Learning.

Multivariable Calculus

  • Multivariable Calculus [online course] – It will help you to think in terms of multivariable functions. You’ll explore important topics, such as gradients, transformations, and the multivariable chain rule.

Linear Algebra

  • Linear Algebra [online course] – Here you will learn about vectors, spaces, matrices and other basic Linear Algebra subjects.

Statistics and Probability

  • Statistics and Probability [online course] – You’ll go over data distributions, data summarizing techniques, and hypothesis testing.

Machine Learning

Optimization

  • Optimization Algorithms [online course] – Although its focus on Deep Learning, this chapter of the Andrew Ng’s course will help you to understand some of the most common optimization techniques.

‘Spicy’ learning path

Ok, you don’t like sweet stuff. So please, be my guest and take this learning path. It is based on the resources available on MIT OpenCourseWare. Before you start suffering, I recommend you to watch the MIT Challenge. It will give you inspiration, motivation, and a learning framework. In particular, watch the video about learning Calculus and see how you can use Scott’s approach in your learning path.

Multivariable Calculus

  • Single Variable Calculus [online course] – This course covers the basics of Calculus, like limits, differentiation rules, and techniques of integration.
  • Multivariable Calculus [online course] – The set of topics studied in this course includes vectors, matrices, partial derivatives, double and triple integrals, and vector calculus.

Linear Algebra

  • Linear Algebra [online course] – This course is focused on matrix theory and linear algebra. It will help you to understand essential topics, like linear models, least-square problems, and singular value decomposition. Professor Strang’s book is also recommended.

Statistics and Probability

  • Introduction to Probability and Statistics [online course] – Complete this course to understand the basic principles of statistical inference, significance tests, Bayesian updating, and more. Since the course uses R, feel challenged to replicate the code in Python and publish it.

Machine Learning

  • Introduction to Analysis [online course] – Here you’ll learn the fundamentals of mathematical analysis, such as continuity, differentiability, sequences and series of numbers and functions, and uniform convergence.
  • Mathematics of Machine Learning [online course] – This course provides a mathematical introduction to Machine Learning, covering statistical theory, algorithms, and online learning.
  • Andrew Ng online course [online course] – Definitely, the best way to start learning Machine learning.
  • Python Machine Learning [book] – It will teach you, step-by-step how to use machine learning in real-world applications.
  • Introduction to Statistical Learning with R [book] – A nice bridge between theory and practice. It was written for R practitioners, but you can find all the solutions to the theoretical and practical exercises in Python here.
  • Pattern Recognition and Machine Learning [book] – From all the books with a more theoretical approach to Machine Learning, this is the easiest to understand.

Optimization

  • Optimization Algorithms [online course] – Although its focus on Deep Learning, this chapter of the Andrew Ng’s course will help you to understand some of the most common optimization techniques.
  • Optimization Methods for Supervised Learning [research paper] – In this paper, you’ll find a comprehensive tutorial on optimization methods for supervised learning. It is written for people familiar with the basics of optimization algorithms, so it assumes some level of mathematics.

Communication

Data Scientists need to understand the data, just as they have to help others to perceive it. Several times, Data Scientists will have to engage with senior management and support decision making. Accordingly, Data Scientists should know how to translate data into useful insights for decisions and actions.

To communicate better, you should master the following:

Public Speaking

  • Toastmasters [course] – I was a member of Toastmasters for three years. It was one of the best learning investments of my life. At Toastmasters I earned confidence to do public speeches in front of an audience, I learned how to structure my messages, and I met a lot of interesting people. Find a Toastmasters in your city and you’ll not regret the time you’ll spend there.

Writing

  • The Minto Pyramid Principle [book] – The best book I ever found on how to structure information for business purposes. It will help you to organize your thinking, writing, and your problem-solving approach. Written by a McKinsey consultant, the masters of the craft.

Data visualization tools

  • Data Visualization on the Browser with Python and Bokeh [online course] – It is important to have knowledge of a data visualization tool. Bokeh is a nice option because it uses Python. In this Udemy course, you will learn to create plots and data dashboards that will impress everyone. As you do the course, try to give it your own twist by working on a project in which you can apply your learnings.

 

Do feel ready to be sexy? I hope so. Now it’s your turn. Start today and become too sexy for your job!

 

Leave a Reply

Your email address will not be published. Required fields are marked *