article thumbnail

Defining data science in 2018

Data Science and Beyond

I got my first data science job in 2012, the year Harvard Business Review announced data scientist to be the sexiest job of the 21st century. Two years later, I published a post on my then-favourite definition of data science , as the intersection between software engineering and statistics.

article thumbnail

Structural Evolutions in Data

O'Reilly on Data

While data scientists were no longer handling Hadoop-sized workloads, they were trying to build predictive models on a different kind of “large” dataset: so-called “unstructured data.” ” There’s as much Keras, TensorFlow, and Torch today as there was Hadoop back in 2010-2012.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

The curse of Dimensionality

Domino Data Lab

There are four properties of high dimensional data: Points move far away from each other in high dimensions. The accuracy of any predictive model approaches 100%. Property 4: The accuracy of any predictive model approaches 100%. There should be no model to accurately predict even and odd rows with random data.

article thumbnail

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

Domino Data Lab

The complete dataset and code used in this blog post is available at try.dominodatalab.com, and all results shown here are fully reproducible, thanks to the Domino reproducibility engine, which is part of the Domino Data Science platform. Knowledge and Data Engineering, IEEE Transactions on, 21, 1263-1284. References. [1]

article thumbnail

Using random effects models in prediction problems

The Unofficial Google Data Science Blog

We have many routine analyses for which the sparsity pattern is closer to the nested case and lme4 scales very well; however, our prediction models tend to have input data that looks like the simulation on the right. Compact approximations to bayesian predictive distributions." Cambridge University Press, (2012). [4]