Remove 2017 Remove Data Collection Remove Data mining Remove Testing
article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

Insufficient training data in the minority class — In domains where data collection is expensive, a dataset containing 10,000 examples is typically considered to be fairly large. Their tests are performed using C4.5-generated 1988), E-state data (Hall et al., Chawla et al., Pima Indian Diabetes (Smith et al.,

article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

After forming the X and y variables, we split the data into training and test sets. Looking at the target vector in the training subset, we notice that our training data is highly imbalanced. All we need to do is instantiate LimeTabularExplainer and give it access to the training data and the independent feature names.

Modeling 139
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Magnificent Mobile Website And App Analytics: Reports, Metrics, How-to!

Occam's Razor

In this post we will look mobile sites first, both data collection and analysis, and then mobile applications. Upsight (nee Kontagent) provides mobile app analytics, with a pinch of advanced segmentation (including sweet cohort analysis ) and big data mining thrown in for good measure. Tag your mobile website.

Metrics 141
article thumbnail

What Is Data Intelligence?

Alation

Data intelligence first emerged to support search & discovery, largely in service of analyst productivity. For years, analysts in enterprises had struggled to find the data they needed to build reports. This problem was only exacerbated by explosive growth in data collection and volume. HBR Review May/June 2017.