article thumbnail

Fundamentals of Data Mining

Data Science 101

This data alone does not make any sense unless it’s identified to be related in some pattern. Data mining is the process of discovering these patterns among the data and is therefore also known as Knowledge Discovery from Data (KDD). Machine learning provides the technical basis for data mining.

article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

Insufficient training data in the minority class — In domains where data collection is expensive, a dataset containing 10,000 examples is typically considered to be fairly large. Data mining for direct marketing: Problems and solutions. Machine learning for the detection of oil spills in satellite radar images.

article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

The surrogate model is often a simple linear model or a decision tree, which are innately interpretable, so the data collected from the perturbations and the corresponding class output can provide a good indication on what influences the model’s decision. Conference on Knowledge Discovery and Data Mining, pp.

Modeling 139