article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

For this demo we’ll use the freely available Statlog (German Credit Data) Data Set, which can be downloaded from Kaggle. This dataset classifies customers based on a set of attributes into two credit risk groups – good or bad. Conference on Knowledge Discovery and Data Mining, pp. Ribeiro, M.

Modeling 139
article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

Insufficient training data in the minority class — In domains where data collection is expensive, a dataset containing 10,000 examples is typically considered to be fairly large. This carries the risk of this modification performing worse than simpler approaches like majority under-sampling. Chawla et al. link] Ling, C.

article thumbnail

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Ontotext

This is a knowledge that anyone can get, but it would take much longer than optimal. But still, is there a risk that AI could replace people at their workplace? This is extremely powerful, so literacy in data collection and data processing will be one of the crucial skills of the future. It’s very likely.