article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

The problem with this approach is that in highly imbalanced sets it can easily lead to a situation where most of the data has to be discarded, and it has been firmly established that when it comes to machine learning data should not be easily thrown out (Banko and Brill, 2001; Halevy et al., The unreasonable effectiveness of data.

article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

PDPs for the bicycle count prediction model (Molnar, 2009). Instead, you should focus on how techniques like PDPs and LIME can be used to gain insights into the model’s inner workings and how you can add those to your data science toolbox. Conference on Knowledge Discovery and Data Mining, pp.

Modeling 139