Remove 2001 Remove 2017 Remove Data Collection Remove Knowledge Discovery
article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

Insufficient training data in the minority class — In domains where data collection is expensive, a dataset containing 10,000 examples is typically considered to be fairly large. note that this variant “performs worse than plain under-sampling based on AUC” when tested on the Adult dataset (Dua & Graff, 2017).