Becoming "Bonjwa" In The Data Industry
For those who are already employed in a data role, and wish to become elite.
I’ll be offering a year’s worth of paid subscription to either Cloud Engineering (Data Engineering), or Human Language Technology (NLP), or Pivot to Product (Business Insight) to 1 lucky winner. Just do the following:
Subscribe to this Substack (I need your email)
Hit the like button
Share this post
In the comments, tell me which one you want.
Wait till the end of the month
This post is not meant for everyone. Not everyone wants to be elite. This is only for those who wish to become elite, and want a simple path on how to achieve it.
Bonjwa: a Korean-language gaming term referring to elite or professional players of StarCraft.
In Starcraft Brood War, there was a player named Flash. He was considered unbeatable for his 13 year career. If you had a match against FlaSh, you pretty much knew you lost.
To become Bonjwa in the data industry, you will need to become a person who has elite level skills. Those who can operate, and run the entire end to end machine learning pipelines are Bonjwa. This means you have the data skills of the Data Engineer, the wrangling skills of the Data Scientist. And, if requested, the math skills of an AI Researcher, once in a while. This stuff is like Captain Planet:
Becoming a person who has all these skills is truly legendary. So, let’s dive in.
Table of Contents:
Why You Should Become Bonjwa
Always Be Learning
Becoming Bonjwa
1 - Why You Should Become Bonjwa
1.1 Pareto Principle
The Pareto Principle is: top 20% getting 80% of the rewards. The bottom 80% of the people getting the bottom 20% of the rewards. Real life is not Normal Distribution, it’s Exponential Distribution. Here’s the data on OkCupid swipes if you never saw this:
The name of the game is to get to the top 20% at all costs. In your career as a Data Practitioner, this means if you are not in the top 20%, you are at the mercy of the job market. If you are in the top 20%, this means you get to dictate the terms of where, and how you work.
1.2 Fat Cash
As Inflation runs rampant, companies will figure out ways to reduce their expenses. An easy way to reduce expenses is to use automation to replace several workers, and then fire them.
Those who spent countless dollars on Data Modelling courses are unskilled. This is due to automation having automated this process of their Data Science role.
Consider this scenario. A company has to reduce their expenses, or they have a call with their shareholders. They currently have both a Data Scientist, and a Data Engineer. What should they do?
Scenario 1: Keep both the Data Scientist, and the Data Engineer, and pay $200k/yr to keep them
Scenario 2: Find a guy who knows has both Data Science, and Data Engineering skills, and pay this guy $150k/yr to keep him.
Most companies will pick Scenario 2, as we saw with the layoffs at Twitter.
2 - Always Be Learning
Stay Humble
If you are reading this post, it’s because you’ve accepted the fact that you don’t know everything & neither do I. There are people on this planet from all walks of life. Many of them come from very different backgrounds, and have very different skills.
When you are working in the real world, you may be phenomenal when it comes to being a data scientist. But, when you speak with a data engineer, and they talk about all the ways to store, and pull data. The pros and cons of using certain storages. How you can cache data to pull it faster, etc... If you are bored, ask one of the data engineers about apache airflow.
Prepared to get schooled.
Accept that you don’t know everything. Life has a funny way of knocking ppl down a peg, and bringing them to reality. For some bizarre reason, people who are unsuccessful in Data tend to be the most cocky, and unskilled group of people on the planet:
If you don’t think this group of people exists, go on Quora for about 2 mins.
Budget For Learning
I keep a budget of $200/month for learning new skills. I end up buying books from amazon on whatever topic I’m after. If I’m working on a project that skews more towards the data engineering side of things, I’ll try to nab a book that.
If I’m working more on the research, and model building side of things, then I’ll try grab a book focused on that.
I try to avoid courses because I’ve been burned far too many times. There are way too many crappy courses that have some robot lady/guy read a bunch of nonsense jargon at me. Or, I've dropped $1000 in the past for a course, only for the guy to go off track. Then at the end I find out the course didn't even solve the problem I was after.
But, I know some people here are visual learners, so whether you wish to learn via courses or books, it’s up to you. Make sure you look at the reviews, and see if you can get some previews/recommendations on it. The last thing you want to do is try to learn data engineering in Python. Only to see the guy go off topic, and start talking about the different kind of models for machine learning.
3 - Becoming Bonjwa
3.1 Data Scientist/Machine Learning Engineer
Data Scientists/MLEs are responsible for looking at research papers to come up new ideas. So, you will want to learn how to gather data, and then wrangle it like them.
Data Gathering
This process involves trying to figure out where to get the relevant data to feed your model. This here is why data scientists, and machine learning engineers make serious cash. Read the post on what Real Data Science Looks like to see this process in more depth. But, you will use 3 main categories of data:
SQL/CSVs
Json
NLP
Data Wrangling
Data Wrangling is the process of taking all your datasets. Then, putting them into a useful state for your model. In this case, you’ll need to get good at 3 main components:
Outlier Cleaning
Imputation
Feature Engineering
3.2 Data Engineer
Data Engineers are responsible for building and maintaining the infrastructure used. This includes creating, and optimizing the systems used to collect, and store data.
They are also responsible for ensuring that data is accessible to the right people at the right time. So that it can be used to make better decisions. They must be able to work with Python/R, and SQL. They will also need to be good at cloud technologies such as AWS. Data Engineers also need to know how to design high quality data pipelines.
- If you want to learn more about this profession.Data Pipelines
Data pipelines are a sequence of processes that you put your data through. The goal is to process the raw data into a series to steps into something that you can actually use. The pipeline will include the following:
Steps for cleaning the data
Transforming the data
Reducing the data to a smaller size
Data pipelines are essential for data science. This is because they allow you to experiment with different methods on smaller datasets. Think about it, if you worked at Subway, you wouldn't look at every single sale they've ever made. You'll instead use a data pipeline to study a smaller region. Then once ready, use all the available data on prod.
This makes it easier to find the best methods and avoid overfitting your models to your data.
Data Warehouses
A data warehouse is a collection of data that is organized for analysis. It is the central repository for data in an organization. You can think about this as a warehouse in real life, except instead of storing products, it’s storing data.
A data warehouse is used in data science to store data so that it can be accessed and analyzed. The data in the warehouse is extracted from different sources. The data is cleansed and normalized so that it can be used for analysis. Here’s a few Data Warehouses:
3.3 AI Researcher
AI researchers are responsible for the development of artificial intelligence applications. They work to create systems that can learn and work on their own, making decisions based on data. This can include anything from analyzing sentences to building self driving cars.
To perform this kind of research, AI researchers must have a strong background in mathematics, computer science, and statistics. They must also be able to analyze data and use it to create learning models for their applications. Because of the complex nature of this work, most AI researchers have a Ph.D. in one of these areas.
- If you want to learn more about this field.NLP Data
NLP data are a type of unstructured text data. They are used in data science to help improve the accuracy of the ML algorithms. NLP data can be used to improve the accuracy of ML algorithms in two ways.
By providing data that is representative of the real-world. People are emotional, and this is a huge way to capture it.
By generating extra insights into customer behavior.
Note: For those who wish to become Quants, or Machine Learning Engineers and want to work with time series data. A current hot popular topic of discussion is Market Sentiment analysis via NLP Data.
Deep Neural Networks
Most Neural Networks have been proven to underperform XGBoost for structured data. But, this is only true for structured data.
For Unstructured data, such as image analysis, or NLP, Neural Networks are kings. If you wish to work with Sentiment data, you’ll need to be good at Neural Networks. I haven’t covered this yet, but pay close attention to the Anime Girls article, it’ll be expanded upon soon.
3.4 Business Analyst
If you have a solid model, and the data to back it up, you’ll need to convey the findings to the executive suite. In order to do this, you’ll need to have some business domain knowledge. For example, if you are a Quant, most portfolio managers do not speak ‘data’, and neither do they speak ‘quant’. They speak a language called ‘Fundamental Analysis’.
To convey the findings of your model, you’ll need to know how to speak their language. This is what most Business Analysts do.
- Understanding Business 101Business Domain Knowledge
Business knowledge is knowing the ins and outs of how your business works. It includes an understanding of the products or services that your company provides. How it competes in the marketplace, competitors, and what your advantages are.
Business knowledge is important because it helps you better understand your company's operations. How it makes money, and where it fits into the larger industry landscape.
When you give a presentation to an executive, if you can't speak their language, they'll fall asleep. You need to know their lingo.
Good Luck trying to become a Bonjwa. Pull it off, and you’ll now have the Pareto Principle working in your favor, instead of against you.
Best of Luck!
cleaning the comments out. For another giveaway.
Hey all, I'd love to get a Data Science & Machine Learning 101 subscription.