Building A Career in Data Science in 2022

Kathleen Lara
5 min readFeb 13, 2022

--

Data Science seems to be the new hot item that everyone wants to jump onto because of the job opportunities and high pay.

Image from Pexels

Those flashy, expensive online certificates makes it seem so easy for anyone to just hop into the data science bandwagon only to find out in the end that it’s still extremely difficult to get into.

We do need more data scientists in this world, especially with all the useful data available. But truth be told, even being in data science is exhausting to be at because there’s just so many things you have to keep up with.

In this article, I’ll share some of the key foundations that I think one should have and continuously learn in order to get in or thrive in this field. Hopefully this will help you navigate to better online courses that are more helpful in case you’re trying to get in this field.

Learn Statistics and Probability

The main core of deriving insights are from mathematical areas of statistics and probability. Some people would jump straight to learning a new programming language or tool that Data Scientists or Engineers use presuming that it is the key to getting into this field — but without mathematical knowledge and theory, it’s most likely impossible to thrive.

It’s a common misconception that you have to be great at programming to become a good data scientist. In fact, some of the best data scientists don’t even know how to code but have a very advanced level of statistics which is the mainstay of data science.

Learn Linear Algebra for Algorithm Development

Linear algebra powers various data science algorithms and models. Algorithms are the procedures that are implemented in code or any other tool and are run on data. Models are the output of these algorithms.

You can’t build a house without a strong foundation. Having a good grip on linear algebra is useful in solving different optimization problems and is behind all the powerful machine learning algorithms.

Learn the basics of SQL: Data preparation, wrangling and ETL

While data engineering has its own field, before you can make insightful analysis, you need to have the core knowledge of how the data is processed. Not having the skill to clean, manipulate and organize data is the biggest roadblock to being able to reap the benefits of data science.

You can of course, do this in Python or R but in the real world where most of the data lives in a warehouse, before you can even jump to doing a proper analysis you need to know where the data is coming from and what has been done to it.

Learn Data Visualization and Exploratory Data Analysis

Data Visualization is a form of art and science. Aesthetics is good but it’s also important to know what type of charts to use to make your data more meaningful. (ie using a boxplot over a bar plot or knowing when to include trend lines or median in your charts)

In terms of data analysis, it’s extremely important to know how to look at your data. Usually you are presented with large datasets and having a bar graph or line chart that points out the obvious isn’t really all that useful. You’d want to know if your data is usable, if it’s normalized, if there are outliers and dig deep into items that are almost impossible to see.

Sometimes you even have to engineer your data, or in fancy terms do a feature engineering process where you can create new variables from raw data that will be more helpful in creating an analysis or a model.

Learn Python / R and its packages

You don’t have to be an advanced programmer to become a Data Scientist. In fact, you probably just need to know your basic syntax, your data model and the right statistical packages to use. Value your time and focus on algorithm development and analysis. As long as you know what you need to do, you can just google / stack overflow your way into the other less important syntax. (This all goes back to knowing statistics and your data well.)

Both Python and R are useful for doing advanced analysis. There’s no language better than the other — it all depends on what type of syntax you’re comfortable with or what packages you will be using. Python is known for having a syntax that’s easier to learn and has relatively improved several packages over the past years. R is known for its great data visualization and was built to do statistical and numerical analysis.

But either way, they are both good languages to know and it really depends on your comfort level and what type of analysis you need to do. (In reality it probably depends on what language your team uses since you have to work collaboratively 😅).

Resources / Tools:

Get an Internship / Work for a Startup: (Yep I included this as a tool, because getting experience is the best way to learn!) Network your way to an internship. If you have a strong knowledge on the key domains I’ve pointed in this article, not having previous work experience wont be an issue.

Statistics for Business Book: This is the book that I used to polish my statistics knowledge but there’s plenty of statistics books available in the market that you can choose from.

DataCamp: They have several courses available and it’s a great interactive tool for learning programming.

Other platforms you can also learn from: edX, Coursera, MITx, Towards Data Science, KDnuggets, Spotify (listen to podcasts)

Get a Master’s / PhD Degree: This is actually not necessary, but if you’re obsessed about learning and you have the money and the resources, do it!

End notes:

I’m also like everyone who is continuously trying to learn and expand my skills. My focus is on building a good scientific foundation to help me derive better insights and build better models.

Do I have a flashy expensive Data Science certificate? I don’t (not even one). If there’s a certificate degree or online courses that offers great curriculum that you can’t find anywhere, by all means! Make sure you also get free 1:1 mentorship with that.

I have a Masters degree with major in Business Statistics but I learned most of the things I now know the hard way and that is through getting several paid and unpaid internships. (and crying myself at night watching free inferential statistics youtube videos).

Let’s connect: https://twitter.com/itskathleenlara

--

--

Kathleen Lara
Kathleen Lara

Written by Kathleen Lara

I’m a Boston based Data Scientist with a background in Data Engineering and Statistics.🤓

Responses (2)