article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

Working with highly imbalanced data can be problematic in several aspects: Distorted performance metrics — In a highly imbalanced dataset, say a binary dataset with a class ratio of 98:2, an algorithm that always predicts the majority class and completely ignores the minority class will still be 98% correct. Morgan Kaufmann Publishers Inc.

article thumbnail

Experiment design and modeling for long-term studies in ads

The Unofficial Google Data Science Blog

Nevertheless, A/B testing has challenges and blind spots, such as: the difficulty of identifying suitable metrics that give "works well" a measurable meaning. For example in ads, experiments using cookies (users) as experimental units are not suited to capture the impact of a treatment on advertisers or publishers nor their reaction to it.

article thumbnail

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Ontotext

They have different metrics for judging whether some content is interesting or not. Milena Yankova : What we did for the BBC in the previous Olympics was that we helped journalists publish their reports faster. Economy.bg: But doesn’t this algorithm put us in an information bubble by filtering the content for us?