2002, Data Collection, Strategy and Testing

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

MAY 20, 2021

Insufficient training data in the minority class — In domains where data collection is expensive, a dataset containing 10,000 examples is typically considered to be fairly large. In their 2002 paper Chawla et al. propose a different strategy where the minority class is over-sampled by generating synthetic examples.

Machine Learning

Machine Learning Metrics Data mining Knowledge Discovery

Unintentional data

The Unofficial Google Data Science Blog

OCTOBER 12, 2017

1]" Statistics, as a discipline, was largely developed in a small data world. Implicitly, there was a prior belief about some interesting causal mechanism or an underlying hypothesis motivating the collection of the data. We must correct for multiple hypothesis tests. We ought not dredge our data.

Experimentation

Experimentation Testing Statistics Metrics

Data Leaders Brief

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Unintentional data

Webinars

Stay Connected