article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

The problem with this approach is that in highly imbalanced sets it can easily lead to a situation where most of the data has to be discarded, and it has been firmly established that when it comes to machine learning data should not be easily thrown out (Banko and Brill, 2001; Halevy et al., The unreasonable effectiveness of data.

article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

For this demo we’ll use the freely available Statlog (German Credit Data) Data Set, which can be downloaded from Kaggle. This dataset classifies customers based on a set of attributes into two credit risk groups – good or bad. PDPs for the bicycle count prediction model (Molnar, 2009). Ribeiro, M. Guestrin, C.,

Modeling 139
article thumbnail

Misleading Statistics Examples – Discover The Potential For Misuse of Statistics & Data In The Digital Age

datapine

A 2009 investigative survey by Dr. Daniele Fanelli from The University of Edinburgh found that 33.7% of scientists surveyed admitted to questionable research practices, including modifying results to improve outcomes, subjective data interpretation, withholding analytical details, and dropping observations because of gut feelings….