Machine Learning, Metrics, Slice and Dice and Uncertainty

Measuring Validity and Reliability of Human Ratings

The Unofficial Google Data Science Blog

JULY 18, 2023

Once we’ve answered that, we will then define and use metrics to understand the quality of human-labeled data, along with a measurement framework that we call Cross-replication Reliability or xRR. If they roll two dice and apply a label if the dice rolls sum to 12 they will agree 85% of the time, purely by chance.

Measurement

Measurement Metrics Uncertainty Slice and Dice

Data scientist as scientist

The Unofficial Google Data Science Blog

OCTOBER 21, 2015

For example, in our field, we can generally blame machine learning feedback (predictions that change the data itself), budget effects (bidders running out of money in repeated auctions) or even the weather (internet usage changes in complicated ways). We sliced and diced the experimental data in many many ways.

Slice and Dice

Slice and Dice Experimentation Data-driven Data Science

Data Leaders Brief

Measuring Validity and Reliability of Human Ratings

Data scientist as scientist

Webinars

Stay Connected