AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Azure ML can become a part of the data ecosystem in an organization, but this requires a mindshift from working with Business Intelligence to more advanced analytics. How can we can adopt a mindshift from Business Intelligence to advanced analytics using Azure ML? Although CRISP-DM is not perfect, the CRISP-DM framework offers a pathway for machine learning using AzureML for Microsoft Data Platform professionals.

AI vs ML vs Data Science vs Business Intelligence

Before we dive in, let’s define strands of AI, Machine Learning and Data Science:

Business intelligence (BI) leverages software and services to transform data into actionable insights that inform an organization’s strategic and tactical business decisions. BI tools access and analyze data sets and present analytical findings in reports, summaries, dashboards, graphs, charts and maps to provide users with detailed intelligence about the state of the business (CIO).

Artificial General Intelligence The goal of AGI is to create a general thinking machine with self-sufficiency and broad human-like general intelligence. The intelligence can be extrapolated from one situation to another and the system can learn over time. Perhaps, if it is to be human-like, it can forget over time, too, and choose areas to focus?

Artificial Narrow Intelligence – these artificial intelligence systems do not think. These systems are applying and executing algorithms on a search space, not thinking. They may also learn from evidence, but the data and the modelling fundamentally comes from humans in some way.

Machine Learning analyses a search space, but it is not human-inspired.

Data Science – Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. It is also not human-inspired (Taken from DataRobot here).

You can read more on this topic in Jen Stirrup’s interview with SAS here.

What does this mean for the Microsoft Data Platform Professional?

People often get very confused by the terms and I hope that this helps. Now that’s done, let’s try and relate it to technology that you already know.

If you’ve previously done work in SQL Server Analysis Services, you will know that Analysis Services had data mining functionality. Excel specialists may know that Excel also has a series of Data Mining Add-ins. You may even have used the add-in Solver, which has neural net technology behind the scenes.

Fortunately, even though the actual technical implementation is different, the data mining concepts are very similar in Analysis Services or Excel, as they are in Azure. However, if you don’t have the advantage of data mining experience in either technology, how do you work out how to proceed? Fortunately, machine learning can be based on the CRISP-DM methodology, which helps to ‘walk’ you through the process from start to end. What is the CRISP-DM methodology? Here is a diagram which explains the flow.

Credit: https://www.datascience-pm.com/crisp-dm-2/

As there is a growth in data, there is a growth of interest in Machine Learning. However, it needs to be done correctly, and it is important to show the business value is important, as well as translating the activity into a range of disparate tasks. The CRISP-DM framework (Cross Industry Standard Process) is one of the most popular approaches. This approach has six phases, and these are listed below.

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment

These phases are mapped down to generic tasks. These are further split down into specialised tasks which are then mapped to process instances of these tasks. Let’s look at these phases, and how they map to processes and activities in AzureML.

Business Understanding

Business Understanding is outside of the technology, but it feeds into it very strongly. This initial phase focuses on understanding the business value-add from a business perspective, then translating this knowledge into a data mining problem definition. This may also involve the generation of a preliminary plan designed to deliver the business objectives.

In Business Intelligence, we are always very focused on trying to answer the business question. This process is the same for AI, Machine Learning and Data Science. What are we trying to achieve?

Data Understanding

Data Understanding is a crucial aspect of all of these areas, and the process will not proceed properly without it. From the perspective of CRISP-DM, this piece involves a number of activities:

  • Collecting Initial Data
  • Describing Data
  • Exploring Data
  • Verifying Data Quality

AzureML facilitates the Data Understanding phase through a series of modules, which are aimed at conducting these activities. For example, there is a Data Format Conversion module which helps in the data exploration piece by converting data into a number of formats that are suitable for AzureML.

There is a Data Reader module for loading data from sources such as the Web, SQL Azure, Windows Azure, Hive, or Windows Azure Blob Storage. If the data is not in these formats, then we can transfer data into these popular formats:

  • ARFF
  • CSV
  • Dataset
  • SVMLight
  • TSV

Data Preparation

Now we have started to understand the data, what is the output? The next phase is Data Preparation, which usually involves:

  • Construct Data, such as Missing Values
  • Integrating and Merging Data, and devising aggregations
  • Formatting and Sanitising Data

AzureML offers a number of parallel functions to cater for these activities. For example, data can be filtered so that the investigation can be focused more specifically. There are a number of Data Transformation modules which help with these area.

That said, it’s often better to clean the data further upstream so it is done closer to the source rather than at the end of a spoke. If you have to transform data in Tableau, Power BI, Google Data Studio, SQL Server Analysis Services, SQL Server Reporting Services….. and so on…. this is repeating effort and also the potential for introducing error.

Modelling Data

When we have prepared our data, the next step is to model the data. This involves a number of sub tasks, including:

  • Selecting the modelling technique
  • Generate Test Design
  • Building the model
  • Assessing the Model

AzureML allows for R and Python coding, and the advantage of using R and Python in the cloud is that we can scale up or down when we need to in response to the data speed and volumes. Using R means that we could test, build and assess our model using R. The advantage here is that, for R code creators, there wouldn’t be a big jump to using AzureML since they are leveraging skill sets.

What happens if you don’t know any coding? AzureML offers modules on classification, clustering and regression by default.

Evaluating the Model

Once the model has been trained, it will need to be tested. There are a number of steps:

  • Evaluating the Results
  • Reviewing the processes
  • Determining the next steps

AzureML also has modules for scoring and training models, and you can find these under the Machine Learning section.

There is also the matter of understanding if the algorithms themselves are correct. In DataOps and MLOps, every time there is change to the code, the developer adds a test dedicated to gauge that change. Testing then is added incrementally in response to the inclusion of every new feature. For a large data pipeline, this might mean that may well be hundreds of tests which are executed when data is ingested, integrated, transformed or derived. Manual testing is often squeezed between development and deployment, and it is also not error-free since there are many moving parts.

Deployment

The deployment will need to be planned and implemented. It is also important to work out how the model will be monitored and maintained, in an ongoing process.

AzureML allows you to deploy Machine Learning algorithms to the cloud, and publishes these as a web service so that they can be consumed easily.

To summarise, AzureML can be used within the widely-accepted CRISP-DM framework for machine learning. This means that budding AI engineers and data scientists have a familiar framework for using AzureML, and it provides a roadmap and a compass for Business Intelligence professionals looking to move into these spheres. Although CRISP-DM is not perfect, the CRISP-DM framework offers a pathway for machine learning using AzureML.

Leave a Reply