Remove Data mining Remove Definition Remove Knowledge Discovery Remove Measurement
article thumbnail

Variance and significance in large-scale online services

The Unofficial Google Data Science Blog

And an LSOS is awash in data, right? Well, it turns out that depending on what it cares to measure, an LSOS might not have enough data. The practical consequence of this is that we can’t afford to be sloppy about measuring statistical significance and confidence intervals. known, equal variances).

article thumbnail

Using Empirical Bayes to approximate posteriors for large "black box" estimators

The Unofficial Google Data Science Blog

Posteriors are useful to understand the system, measure accuracy, and make better decisions. Methods like the Poisson bootstrap can help us measure the variability of $t$, but don’t give us posteriors either, particularly since good high-dimensional estimators aren’t unbiased. We don’t have a fully satisfying answer for this issue.

KDD 40
article thumbnail

LSOS experiments: how I learned to stop worrying and love the variability

The Unofficial Google Data Science Blog

And since the metric average is different in each hour of day, this is a source of variation in measuring the experimental effect. Another way to build a classifier for variance reduction is to address the rare event problem directly — what if we could predict a subset of instances in which the event of interest will definitely not occur?