Machine Learning Operations (MLOps) Part 1
Your model isn´t the solution. A Precursor to Data Centrism.
This post was written by BowTiedBear42. You can find him on Twitter, and on the BowTied Data Org discord server for paid readers.
TLDR: If you are leading a team of data practitioners, once you start scaling, you’ll be working a lot with MLOps. If you are going to be working as a Machine Learning Engineer for a tech company that has it’s own dedicated Data Team, and they’ve been in the business for quite some time. I’d recommend reading this.
Table of Contents:
Introduction
Understanding The Challenge
Why ML Solutions Are Different
Approaching a Solution
1 - Introduction
Glad to be back here for a topic that I invested a lot of time into. Just like your car, a machine learning model needs a bit of maintenance to keep up the good performance.
There have been many failed projects in this time. Many had technically good models but failed to deliver long term business value. I focused on the real-world application of machine learning over the last 6-7 years so this topic is near and dear to me.
This post will give you some key tools to not become one of those failures. My key message is that you have to take a step back from your models and consider all procedures and components that are part of the solution.
2 - Understanding The Challenge
As a starting point you need to understand that machine learning solutions have a lot more moving parts than the core models. The OG paper on this topic has been written by google people in 2015:
Still highly recommend it, some but not all issues have been solved by common approaches today. Many of the basic problems are generally recognized and sometimes solved today.
MLOps is derived from machine learning operations. It is heavily influenced by ideas from the DevOps movement and became the common term for solutions that tackle these problems. Do not let the terms confuse you, you can do great MLOps without running DevOps Teams.
And by now every consultancy and software vendor under the sun will try to sell you their solution.
And to be very clear: Many solutions are helpful in one way or another - but the overall picture is still rather confusing.
3 - Why ML Solutions Are Different
I want to point out one important aspect that makes ML projects different from other software projects. Let us take a look at the Cross-industry standard process for data mining:
Pause for a moment and realize that this has been defined in 1996. Many machine learning projects could have worked if the team observed these ideas more closely.
This process heavily implies a continuous improvement, an idea that has also been widely adopted with agile development methodologies. The big difference is that continuous improvement on software projects is a human effort - ML models can and should be automatically re-trained based on feedback data.
This is why tracking your decisions and measuring reactions is one of the most important aspects of the whole ML solution.
Get this part right and you solved the biggest hurdle. Get it wrong and you may be wasting months or years worth of training data for model improvement.
4 - Approaching a Solution
What I want to offer you here is a step back. We will discuss the fundamental functional building blocks. More or less sophisticated solutions are available for each of them.
I like to think about this in the following dimensions:
Data Management:
At a minimum you need reliable and robust data processes to feed your models once they are in production and to process the real world feedback. This is mostly achieved by automatic data pipelines with some checks that newly loaded data are roughly what you expect.
In many cases you also need some way to figure out which data you used for your model development. A well maintained data lineage is a good starting point. Any possibility to recreate the state of your data at a given time in the past will come in handy when you face serious questions.
Software Quality Management:
For a start it is important to understand that all serious machine learning endeavors are software projects. Many data scientists lack software engineering training, so you usually need to pay some attention.
Software engineers have developed a set of tools to ensure their products are of the highest quality possible.
This goes from very basic things such as code structure to sophisticated mechanisms such as CI/CD-Pipelines.I have strong feelings about data scientists following the most basic standards after spending a year of my life refactoring scripts without any documentation or comments. Doing this right with a bit of discipline will save you a world of pain down the road.
The most common data science tools today are open source and depend on a huge number of libraries that all have different people behind them. This will need conscious management at some point. You should be aware that attacks on this open source supply chain have already happened.
I expect data scientists to learn the basics here and to get a dedicated software engineering person on board for every serious endeavor.Model Development Management:
Generally the way to streamline your model development process and code base. Working in Git is a bit of a learning curve for many data scientists. It is especially powerful in combination with an automated CI/CD pipeline that deploys a new version of your model once you update the master branch files.
I would still highly recommend it, the capability to easily figure out which code has been in use at any given time alone is worth the effort.Business Rule Management:
Basically every machine learning application will have some kind of business rules in them. The most simple cases are something like “do y if the score is > 0.69 and x otherwise”. Usually it gets a bit more complicated, you would not want to recommend products with a low inventory in an ecommerce application for example.
You need some way to express and apply those rules and usually also a way to figure out how certain decisions have been made. In a simple scenario you can simply put them into the same code as your model but in many cases you need some dedicated business rule management system.
Model Application Management:
This is one of the most manual parts to my experience.
The software system you use to develop models is usually not the system that they will be applied in. You have to find a way to deliver your model results to where they need to go.
Most of the time this integration will be a joint effort with the team that is responsible for the target system. Rest-API is one of the most popular options today and widely supported.
I recommend that you make sure the following things are done early:Sketch of the overall solution architecture
Joint decision on the integration design
Make sure you have clear KPIs for the system interaction
Make sure the target system team can handle the changes in time
Establish joint operating procedures early
Drill drill and drill to make sure everybody understands how decisions are made and recorded in the target system
Drill even mode to make sure everybody understands how feedback is recorded in the target system
Do functional as well as performance tests as early as possible
Model Performance Management:
The first rule of model performance management is that you need to pick the right metric. Sounds easy, but you may find that typical data science KPIs such as precision do not represent the kind of value your end users are seeking.
My favorite example here is fraud detection: you can build a very precise model if you classify every case as non fraudulent. The fraud cases are so rare that this is absolutely correct but also very useless because you will not catch any of them. Moral of the story: start from the actual value your model should provide and choose a KPI to represent that.
At the time of your deployment you only know how well your model performed against historical data. I would generally recommend that you keep a close eye on the performance when you use it on new cases.
This issue is often called model drift:Model Drift in Machine Learning Models
You basically need to build a dataset of the same structure as your training data from all cases you use your model for. This will allow you to analyze how well your model fitted to those new cases.
This may not always be possible automatically, for example in image classification applications you will have to send a sample of the new cases for human classification.
More to come in a part 2…