Remove 2001 Remove Data Collection Remove Metrics Remove Visualization
article thumbnail

Data Science, Past & Future

Domino Data Lab

He also really informed a lot of the early thinking about data visualization. It involved a lot of interesting work on something new that was data management. It involved a lot of work with applied math, some depth in statistics and visualization, and also a lot of communication skills. I can point to the year 2001.

article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

Further, imbalanced data exacerbates problems arising from the curse of dimensionality often found in such biological data. Insufficient training data in the minority class — In domains where data collection is expensive, a dataset containing 10,000 examples is typically considered to be fairly large.