article thumbnail

Variance and significance in large-scale online services

The Unofficial Google Data Science Blog

For this purpose, let’s assume we use a t-test for difference between group means. Effect size thus defined is useful because the statistical power of a classical test for $delta$ being non-zero depends on $e/sqrt{tilde{n}}$, where $tilde{n}$ is the harmonic mean of sample sizes of the two groups being compared.

article thumbnail

Changing assignment weights with time-based confounders

The Unofficial Google Data Science Blog

Another reason to use ramp-up is to test if a website's infrastructure can handle deploying a new arm to all of its users. The website wants to make sure they have the infrastructure to handle the feature while testing if engagement increases enough to justify the infrastructure. Here, day-of-week is a time-based confounder.