article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

The problem with this approach is that in highly imbalanced sets it can easily lead to a situation where most of the data has to be discarded, and it has been firmly established that when it comes to machine learning data should not be easily thrown out (Banko and Brill, 2001; Halevy et al., Indeed, in the original paper Chawla et al.

article thumbnail

Themes and Conferences per Pacoid, Episode 5

Domino Data Lab

In terms of teaching and learning data science, Project Jupyter is probably the biggest news over the past decade – even though Jupyter’s origins go back to 2001! If you haven’t seen R2D3 ’s excellent A visual introduction to machine learning series, part 1 and part 2 … run, do not walk, to your nearest browser and check that out!

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

My read of that narrative arc is that some truly weird tensions showed up circa 2001: Arguably, it’s the heyday of DW+BI. A very big mess since circa 2001, and now becoming quite a dangerous mess. See also the paper “ The Case for Open Metadata ” by Mandy Chessell (2017–04–21) at IBM UK for compelling perspectives about open metadata.

article thumbnail

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

He’s been out of Wolfram for a while and writing exquisite science books including Elements: A Visual Explanation of Every Known Atom in the Universe and Molecules: The Architecture of Everything. Consider the following timeline: 2001 – Physics grad students are getting hired in quantity by hedge funds to work on Wall St.