Remove 2009 Remove Data Science Remove Knowledge Discovery Remove Testing
article thumbnail

Explaining black-box models using attribute importance, PDPs, and LIME

Domino Data Lab

After forming the X and y variables, we split the data into training and test sets. Looking at the target vector in the training subset, we notice that our training data is highly imbalanced. PDPs for the bicycle count prediction model (Molnar, 2009). X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

Modeling 139
article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

The problem with this approach is that in highly imbalanced sets it can easily lead to a situation where most of the data has to be discarded, and it has been firmly established that when it comes to machine learning data should not be easily thrown out (Banko and Brill, 2001; Halevy et al., Their tests are performed using C4.5-generated