7 Critical Automations for a Machine Learning Automation Platform

Dataiku Product, Scaling AI Lauren Anderson

Incorporating automation into ML projects can help data scientists or data engineers deploy (and keep) more work in production while reducing the amount of time spent on maintaining those projects once they’re in production. AutoML is the functionality most often discussed when talking about automation in ML, but there are also tasks outside the typical scope of AutoML that can be automated to save data scientists countless hours of repetitive work — not to mention reduce anxiety levels when it comes to managing models in production. When looking for a machine learning automation platform or solution, below are some features to keep in mind:

→ Go Further: Debunk the 7 Myths of MLOps

Operationalizing ML:

1. AutoML: AutoML can help save time by automating key aspects of feature engineering and feature generation, model ensembling, and model training. The result should be a functional data pipeline and model training process along with a working model artifact to provide a baseline for model accuracy.

2. Automated Deployment: There should be a clear delineation between ML experiments not ready for production and deployed work in production. But, to push work from the initial experimental stage to the production stage, there should be an easy way to bundle projects and automatically push to prod with triggers or schedules.

Monitoring in Production:

3. Automated Data Consistency Checks: Instead of manually checking models on a specific time table — or worse, getting notified when the issue has been going on for days — look for automated alerts and reports for customizable parameters to alert you to data issues in production more quickly.

4. Automate Scenario-Based Programmable Triggers and Model Retraining: Automate manual activities with common triggers for action such as by time period (every X seconds), when datasets have been modified, or with customized Python scripts for unique triggers. For example, maybe you create a trigger when a certain threshold of data drift is detected to initiate an automated retraining of your model. Once the new model is ready, it can be compared to the prior version and then deployed into production. You should also get alerted with report metrics after these scenarios have been executed so you can step in when needed to check and validate the automations.

Other Features to Enable Automation:

5. Automated Script Shortcuts: Create a repository of common scripts so that you (and other data scientists within the organization) can automatically apply them in a few clicks and manage new data more efficiently vs. recoding every time.

6. Jobs Monitoring With APIs: You should have the ability to monitor jobs outside of the platform in the method and location of your choosing, such as programmatically from CI/CD tools like Jenkins or JFrog. An API offers methods to retrieve the list of jobs and their status, so that they can be easily monitored wherever you need them.

7. Recursive Dataset Building: When changes are applied to downstream data, instead of manually applying these to data upstream in the flow, you should be able to build datasets recursively, so that you can automatically rebuild datasets that have become outdated due to changes upstream in a flow. 

Dataiku contains all these features to make operationalizing and scaling ML initiatives more achievable. To explore some of the other features in Dataiku that make the lives of data scientists easier, explore this product overview.

You May Also Like

How to Build Tailored Enterprise Chatbots at Scale

Read More

Operationalizing Data Quality: The Key to Successful Modern Analytics

Read More

Alteryx to Dataiku: AutoML

Read More

Conquering the Data Deluge Through Streamlined Data Access

Read More