article thumbnail

2019 US Open Predictions: Doubling Down on the Data

DataRobot Blog

Using this data, we built a historical dataset containing past results, current Elo scores (both overall and surface-specific) and tournament information, then used DataRobot to determine the best model and predict the probability that a player would win a set. Andrew received his Ph.D.

article thumbnail

InfoTribes, Reality Brokers

O'Reilly on Data

On top of this, pre-existing societal biases are being reinforced and promulgated at previously unheard of scales as we increasingly integrate machine learning models into our daily lives. Put simply, we are reduced to the inputs of an algorithm. It just so happens that dividing people increases engagement and makes economic sense.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Topics to watch at the Strata Data Conference in New York 2019

O'Reilly on Data

So, we used a form of the Term Frequency-Inverse Document Frequency (TF/IDF) technique to identify and rank the top terms in this year’s Strata NY proposal topics—as well as those for 2018, 2017, and 2016. 2) is unchanged from Strata NY 2018, it’s up three places from Strata NY 2017—and eight places relative to 2016.

IoT 20
article thumbnail

Using random effects models in prediction problems

The Unofficial Google Data Science Blog

We have many routine analyses for which the sparsity pattern is closer to the nested case and lme4 scales very well; however, our prediction models tend to have input data that looks like the simulation on the right. A Scalable Blocked Gibbs Sampling Algorithm For Gaussian And Poisson Regression Models." bandit problems).

article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

Model distillation – this approach builds a separate explainable model that mimics the input-output behaviour of the deep network. Because this separate model is essentially a white-box, it can be used for extraction of rules that explain the decisions behind the ANN. 2016) for an example of this technique (LIME).

Modeling 139
article thumbnail

Deep Learning Illustrated: Building Natural Language Processing Models

Domino Data Lab

GloVe and word2vec differ in their underlying methodology: word2vec uses predictive models, while GloVe is count based. You can home in on an optimal value by specifying, say, 32 dimensions and varying this value by powers of 2. s lead may not be the optimal choice. Natural Language Processing.] Joulin, A.,