Remove 2001 Remove 2014 Remove Risk Remove Testing
article thumbnail

To Balance or Not to Balance?

The Unofficial Google Data Science Blog

A naïve way to solve this problem would be to compare the proportion of buyers between the exposed and unexposed groups, using a simple test for equality of means. This algorithm is implemented in the SuperLearner R package (Polley & van der Laan, 2014). Random forest with default R tuning parameters (Breiman, 2001).

article thumbnail

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

Also, while surveying the literature two key drivers stood out: Risk management is the thin-edge-of-the-wedge ?for My read of that narrative arc is that some truly weird tensions showed up circa 2001: Arguably, it’s the heyday of DW+BI. A very big mess since circa 2001, and now becoming quite a dangerous mess. a second priority?at

article thumbnail

Data Science at The New York Times

Domino Data Lab

A “data scientist” might build a multistage processing pipeline in Python, design a hypothesis test, perform a regression analysis over data samples with R, design and implement an algorithm in Hadoop, or communicate the results of our analyses to other members of the organization in a clear and concise fashion.