article thumbnail

The Lean Analytics Cycle: Metrics > Hypothesis > Experiment > Act

Occam's Razor

Sometimes, we escape the clutches of this sub optimal existence and do pick good metrics or engage in simple A/B testing. Testing out a new feature. Identify, hypothesize, test, react. But at the same time, they had to have a real test of an actual feature. You don’t need a beautiful beast to go out and test.

Metrics 156
article thumbnail

Towards optimal experimentation in online systems

The Unofficial Google Data Science Blog

Crucially, it takes into account the uncertainty inherent in our experiments. Multiparameter experiments, however, generate richer data than standard A/B tests, and automated t-tests alone are insufficient to analyze them well. In this section we’ll discuss how we approach these two kinds of uncertainty with QCQP.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Changing assignment weights with time-based confounders

The Unofficial Google Data Science Blog

Another reason to use ramp-up is to test if a website's infrastructure can handle deploying a new arm to all of its users. The website wants to make sure they have the infrastructure to handle the feature while testing if engagement increases enough to justify the infrastructure. We offer two examples where this may be the case.

article thumbnail

Measuring Validity and Reliability of Human Ratings

The Unofficial Google Data Science Blog

Editor's note : The relationship between reliability and validity are somewhat analogous to that between the notions of statistical uncertainty and representational uncertainty introduced in an earlier post. But for more complicated metrics like xRR, our preference is to bootstrap when measuring uncertainty.

article thumbnail

Why model calibration matters and how to achieve it

The Unofficial Google Data Science Blog

To explain, let’s borrow a quote from Nate Silver’s The Signal and the Noise : One of the most important tests of a forecast — I would argue that it is the single most important one — is called calibration. bar{pi} (1 - bar{pi})$: This is the irreducible loss due to uncertainty. One useful property of these predictions is calibration.

Modeling 122
article thumbnail

Estimating causal effects using geo experiments

The Unofficial Google Data Science Blog

Similarly, we could test the effectiveness of a search ad compared to showing only organic search results. Structure of a geo experiment A typical geo experiment consists of two distinct time periods: pretest and test. After the test period finishes, the campaigns in the treatment group are reset to their original configurations.

article thumbnail

The trinity of errors in applying confidence intervals: An exploration using Statsmodels

O'Reilly on Data

Because of this trifecta of errors, we need dynamic models that quantify the uncertainty inherent in our financial estimates and predictions. Practitioners in all social sciences, especially financial economics, use confidence intervals to quantify the uncertainty in their estimates and predictions. Image Source: Wikimedia Commons.