MLOps Part 4: Beyond MLOps
Beyond MLOps are things that killed countless machine learning projects – Don´t become their next victim.
Beyond MLOps…
… are a few dark corners with monsters. Come along if you want to get to know them before you meet in the wild.
This post was written by BowTiedBear42. You can find his substack below
, and on the BowTied Data Org discord server for paid readers.
Table of Contents
Introduction
Trying To Fix Culture With Technology
Data Governance & Processes
Wide Range of User Types
Decentralized Tradition of Data Analytics
1 - Introduction
To be clear here: MLOps is the best general mental framework for serious applied machine learning to me.
There are some things outside of the common framework that often derail ML and analytics projects. Had to deal with a few of them recently in my daily role – which is product management for a large ML and analytics platform. This is certainly not an exhaustive overview but a mix of things I have repeatedly observed in large enterprises.
Little disclaimer: you probably do not have to worry about this too much at a beginner level - but most certainly as soon as you get into product management
2 - Trying to Fix Culture With Technology
This is a very common issue in big organizations, especially when the initiative to introduce new technologies comes from IT. It usually results in lacking adoption and because of that in very limited ROI. Executive sponsorship and support is a good thing to have, but far from enough.
After all it is about a new way of working, and change is rarely comfortable. I do not believe in shortcuts here, meaningful change in any bigger organization takes time. The term change management has been badly tarnished over the last 20+ years and I mostly blame consultancies for it 🙂.
I am not a fan of projects for change management, this need to be a continuous effort if you want it to stick. In my experience only two things really help: persistence and good results. I prefer to pick a few promising and impactful cases to completely overdeliver on. Once you can show such results, the conversations become much easier.
This is generally a good field for leadership by example. Visibly different ways of working from your mid and top leadership will do a lot to actually change things. Be careful with the low hanging fruits here, there usually is a reason why nobody picked them.
This article about management communication should be very helpful here:
3 - Data Governance & Processes (Data + IT)
Whenever you work in established organizations, you will most likely have some form of processes to build on. I´ll keep this very broad, because the point is the same no matter if it is procurement, IT-security or budget approvals.
These processes are usually not optimized for fast AI projects. I would highly recommend that you initially map out all things you need from the organization and screen the established processes. You will most likely find some weak spots and should act early. You have to figure out what this means in your situation, but you´d be surprised how much an open conversation with process owners can do.
4 - Wide Range of User Types
In many cases ML and analytics infrastructure will serve a range of user types with different levels of expertise. Just think about the different fields raptor describes here:
Now add various types of analysts from different business departments. Give a data scientist the familiar tool stack plus massive compute resources and you´ll have a happy data scientist
The analyst on the other hand may only be familiar with excel and need some initial training plus continuous coaching in the new tools and way of working. You may have a very easy time expanding your technical capabilities to make their work much easier. Especially modern data visualization tools are usually easy to sell here:
You will usually need some additional services on top for those users, for example in the form of solution architecture and data engineering. Additionally, you have to make sure that you can communicate with them in a good way.
There is no easy shortcut here, you need to understand your users. Sketch out a few rough personas and their needs for initial service design and set up a good feedback mechanism. Act quickly if things go sideways for any user.
5 - Decentralized Tradition of Data Analytics
Many industries used data analytics in some form for many decades. Think about customer analytics or actuarial functions. People in these departments are usually highly trained and skilled but for the most part not in current IT best practices. Additionally IT departments did not see data analytics as their task, so even the most complex and business-critical models have not been scrutinized in the same way software development projects would have been.
These teams can be your best friends or your worst enemy.
Some typical symptoms:
Isolated data analytics systems, maybe even something as classic as a SPSS
Complex code produced for data analytics with little structure, documentation or even code comments
Lack of systematic testing
Only one environment for development and production
Good luck if you try to completely overhaul these ways of working after 20+ years. The culture aspect I described initially applies here too. I will try to make them allies every single time. It may not always work but is absolutely worth it.
Some things to consider:
The youngest team members often have keen interest in machine learning and AI, often some basic training in more modern approaches too. They are potentially your best allies for change
Don´t take refactoring lightly. The impulse to replace legacy data analytics systems such as SPSS or SAS with something modern comes quickly but as of 2023 this is still a labor-intensive process. Also keep in mind that the legacy analytics vendors developed a lot of features to make life for analysts in specific industries easier and may not be so easily replaceable.
As always: find ways to create value and make life easier - in their eyes. These teams often do a lot of tasks that are beyond their core capabilities and should probably be done by a central unit, for example maintaining their own environments or daily monitoring of production processes. If you can design your MLOps-environment in a way that allows decentralized teams to focus on the model development you will probably have good conversations. Targeted training, coding templates, sample libraries and coaching are all options you should explore to make it happen.
So, I hope this helps you in your endeavors in the realm of data.
All the best anon,
Your bear