If you have any questions, you can leave them in the comments, or you can hit me up on twitter: https://twitter.com/BowTied_Raptor
Data Science Competition
The contest is finished, I will open this to the public in the next few days, feel free to play with the dataset. Here’s the top 3 winners:
I’ll reach out to the winners, and do a double check with them to confirm where they would like their earnings.
Here’s some of the interesting questions I was asked in October.
Q1 - Business vs Scientific Analyst
Q: If you are data analyst working with scientific data, should you hire a data analyst who works with business data?
Yes, but remember the priorities for both industries. When hiring a data analyst from the business world remember their specialty. In the business world, you will spend a lot of time on data wrangling, and building data pipelines. While doing this, you will also keep the company's bottom line in mind. These analysts will prefer to work on projects that focus on creating MVPs. MVPs mean minimal viable product. They want to get something out, and ship it.
When working with scientific data, you are more interested in the statistics. This means you’ll be concerned about specificity vs sensitivity. You’ll be paying attention to ANOVA tables, p-values, T-tests, etc…
It’s not impossible, as both of you guys will already have the core foundation knowledge to be a data analyst. It’s that your priorities, the things you care about are different.
Q2 - Still Become a Data Analyst?
Q: I didn’t do that much analytics, but got a lot of exposure to pandas, and working with dataframes, should I try to become a data analyst?
Yes, of course. You’ve got exposure to pandas, and working with dataframes. This means you’ve got experience with data wrangling, and data munging. You’ve already done the harder parts of the job… Just go learn SQL, because I noticed you didn’t put SQL in your background.
The technical skills are far harder to get, far more in demand, than the math skills. The mathematics in data science & machine learning engineering is simple linear algebra. And, most of the modelling has already been solved courtesy of kaggle competitions. Companies value the technical & data skills more than the mathematical aptitude.
Go become a data analyst, and get that experience under your belt. Once you got your first position, you've got your entry into the industry. With that taken care of, you can then apply to harder and more senior positions.
I promise you, a data analyst position doesn’t look anything like this:
Note: This is a recurring problem I see with my readers. A lot of you guys overestimate the amount of technical skills you need. Start off easy with some SQL, Python, and R. Get your foot in the door as a data analyst. You can worry about the senior Data Scientist, & Machine Learning Engineer roles afterwards.
And, regardless of your confidence level, you should apply for the roles anyways. Most of the job requirements, and responsibilities there are wish lists. Realistically speaking, if they can get a guy who can match 60% of that, that’s awesome. Fking apply anyways, what’s the worst that can happen, your resume goes in the trash. You get the exact same outcome anyways if you didn’t apply…
Q3 - I Want Extra Practice
Q: I’m uncomfortable with my skillsets in: SQL, Python, and R. What can I do to get some extra practice?
The best way to get practice is to force yourself to write the actual code. Doing generic multiple choice questions will not help much. The best way to get experience with pandas, and data.table is to grab some data from kaggle, and try to work on it on your own. The more messed up, and ugly the data, the better your skills will become.
I use w3 for doing some practice for my technical interviews. You can think about these as our version of practicing LeetCode. Relevant links below:
SQL/NoSQL Practice
Super Easy SQL - This is SQL for dummies (excellent first step)
More Easy SQL - Slightly harder SQL than above
Standard SQL - Harder SQL questions
NoSQL - MongoDB questions
PostGreSQL - Postgres questions
MySQL - MySQL specific questions
Python Practice
Generic Python - For all your Python specific practice
Pandas - Pandas inside out (dataframes)
Numpy - Everything about numpy arrays
Matplotlib - Plotting in Python
R Practice
Generic R - For all your R specific practice
data.table - data.table inside out (dataframes)
dplyr - Lesser used data wrangling in R
Enjoy the above free resources.
Note: You don’t need to practice every single one of these links. Just focus on what’s relevant for you. This page isn’t going anywhere, you can always come back to this later on.