article thumbnail

Data Science at The New York Times

Domino Data Lab

Chris Wiggins , Chief Data Scientist at The New York Times, presented “Data Science at the New York Times” at Rev. Wiggins also indicated that data science, data engineering, and data analysis are different groups at The New York Times. Data science. Session Summary.

article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

After forming the X and y variables, we split the data into training and test sets. Looking at the target vector in the training subset, we notice that our training data is highly imbalanced. PDPs for the bicycle count prediction model (Molnar, 2009). X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

Modeling 139
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Thread Dev Interview 6: @chris.mrbananas.greening

Data Science 101

Another time we were doing “load testing” and everyone in the office was hitting the site and it seemed to be running really slowly. People were certainly using it for running and testing locally. Wow – this was 2009! An iPhone app with a video from 2009! I do remember one visit after the dot com crash.

article thumbnail

Smarten Augmented Analytics Receives CERT-IN Certification for Its Products and Services!

Smarten

After completion of the testing procedure, the certificate is provided to show that all requirements were met. The Smarten approach to business intelligence and business analytics focuses on the business user and provides Advanced Data Discovery so users can perform early prototyping and test hypotheses without the skills of a data scientist.

article thumbnail

6 Case Studies on The Benefits of Business Intelligence And Analytics

datapine

The companies that are most successful at marketing in both B2C and B2B are using data and online BI tools to craft hyper-specific campaigns that reach out to targeted prospects with a curated message. Everything is being tested, and then the campaigns that succeed get more money put into them, while the others aren’t repeated.

article thumbnail

Understanding Simpson’s Paradox to Avoid Faulty Conclusions

Sisense

In general, it is not possible to give a rule of thumb about when data should be partitioned or combined. A new drug promising to reduce the risk of heart attack was tested with two groups. Now, let’s check a slightly different case in which grouping the data leads to incorrect results. It really depends on the circumstances.

Testing 104
article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

The problem with this approach is that in highly imbalanced sets it can easily lead to a situation where most of the data has to be discarded, and it has been firmly established that when it comes to machine learning data should not be easily thrown out (Banko and Brill, 2001; Halevy et al., Their tests are performed using C4.5-generated