article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

Insufficient training data in the minority class — In domains where data collection is expensive, a dataset containing 10,000 examples is typically considered to be fairly large. In their 2002 paper Chawla et al. 2002) have performed a comprehensive evaluation of the impact of SMOTE- based up-sampling.