Remove 2005 Remove Data Quality Remove Modeling Remove Slice and Dice
article thumbnail

Measuring Validity and Reliability of Human Ratings

The Unofficial Google Data Science Blog

If they roll two dice and apply a label if the dice rolls sum to 12 they will agree 85% of the time, purely by chance. Under certain conditions, the non-parametric and parametric measurements should be the same , and disagreements between the approaches should help illustrate how our assumptions about the data are correct or not.