article thumbnail

On the Hunt for Patterns: from Hippocrates to Supercomputers

Ontotext

These are the so-called supercomputers, led by a smart legion of researchers and practitioners in the fields of data-driven knowledge discovery. Thanks to their might, now scientists and practitioners can develop innovative ways of collecting, storing, processing, and, ultimately, finding patterns in data.

article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

Insufficient training data in the minority class — In domains where data collection is expensive, a dataset containing 10,000 examples is typically considered to be fairly large. note that this variant “performs worse than plain under-sampling based on AUC” when tested on the Adult dataset (Dua & Graff, 2017).

article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

The surrogate model is often a simple linear model or a decision tree, which are innately interpretable, so the data collected from the perturbations and the corresponding class output can provide a good indication on what influences the model’s decision. Conference on Knowledge Discovery and Data Mining, pp.

Modeling 139