article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

This carries the risk of this modification performing worse than simpler approaches like majority under-sampling. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 73–79. Chawla et al. Indeed, in the original paper Chawla et al. UCI machine learning repository. link] Fisher, R.

article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

This dataset classifies customers based on a set of attributes into two credit risk groups – good or bad. This is to be expected, as there is no reason for a perfect 50:50 separation of the good vs. bad credit risk. PDPs for the bicycle count prediction model (Molnar, 2009). 1 570 0 570 Name: credit, dtype: int64.

Modeling 139