Skip to main content

Data & Intelligence

Predictive Model Ensembles: Pros and Cons

Data Center Interior

Many recent machine learning challenges winners are predictive model ensembles. We have seen this in the news. Data science challenges are hosted on many platforms. Techniques included decision trees, regression, and neural networks. And, winning ensembles used these in concert. But, let’s understand the pros and cons of an ensemble approach.

Pros of Model Ensembles

Crowd sourcing is better; diversity should be leveraged. We should choose the best model from a collection of choices. An ensemble can create lower variance and lower bias. Also, an ensemble creates a deeper understanding of the data. Underlying data patterns are hidden. Ensembles should be used for more accuracy.

Generally, ensembles have higher predictive accuracy. Test results improve with the size of the ensemble. That is why, ensembles are often challenge winners. Each technique has its own characteristics. For example, in data wrangling and tuning options. Tweaking makes models fit better.

Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

With a bagging approach, each model should be tuned to overfit. Model independence is exploited because bagging is a variance reduction technique. Predictions can be softened for improved stability. These models are run in parallel and averaged. With boosting, models are used sequentially and wrong classifications from prior runs are given more weight. Boosting is a bias reduction technique. Stacking can be done with random forests. Stacking improves accuracy while keeping variance and bias low.

Cons of Model Ensembles

However, model ensembles are not always better. New observations can still confuse. That is, ensembles cannot help unknown differences between sample and population. Ensembles should be used carefully.

Is it understood? Ensembles can be more difficult to interpret. Sometimes, even the very best ideas cannot be sold to decision makers. Sometimes, the best ideas are not accepted by the final users.

Finally, ensembles cost more to create, train, and deploy. The ROI of an ensemble approach should be considered carefully. Generally, more complexity is not good in of itself. KISS. We have found that a full one-third of IS systems failure is due to complexity.

Take Away

Improved results are achieved by using a predictive model ensemble. But, what are we trying to do? Is it worth it?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Rick Kapalko

With degrees in Analysis and Management, Mr. Kapalko has spent two decades in both project management of agile development and contract management of operations optimization. He is adept at managing the solution path - realizing business value from engineering projects, people, processes, and data.

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram