article thumbnail

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

Exploratory data science and visualization: Access Iceberg tables through auto-discovered CDW connection in CML projects. 1 2008 7009728. 8 2001 5967780. Our imported flights table now contains the same data as the existing external hive table and we can quickly check the row counts by year to confirm: year _c1. group by year.

article thumbnail

Themes and Conferences per Pacoid, Episode 5

Domino Data Lab

I’ve been teaching data science since 2008 privately for employers – exec staff, investors, IT teams, and the data teams I’ve led – and since 2013, for industry professionals in general. Data visualization for prediction accuracy ( credit: R2D3 ). This is not a new gig, by any stretch. NASA persistently misspells Jupyter.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Reclaiming the stories that algorithms tell

O'Reilly on Data

Each of the classroom’s library books has a color coded sticker on its spine reflecting its Lexile score—a visual announcement of its official complexity level, and thus of which students might be officially ready to read it. This whole scoring system also changes the story about who librarians and teachers are.

Risk 355
article thumbnail

Data Science, Past & Future

Domino Data Lab

He also really informed a lot of the early thinking about data visualization. It involved a lot of work with applied math, some depth in statistics and visualization, and also a lot of communication skills. I can point to the year 2001. It was also the year, 2001, when “ Agile Manifesto ” was published.

article thumbnail

ML internals: Synthetic Minority Oversampling (SMOTE) Technique

Domino Data Lab

The problem with this approach is that in highly imbalanced sets it can easily lead to a situation where most of the data has to be discarded, and it has been firmly established that when it comes to machine learning data should not be easily thrown out (Banko and Brill, 2001; Halevy et al., References. Banko, M., & Brill, E.

article thumbnail

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

He’s been out of Wolfram for a while and writing exquisite science books including Elements: A Visual Explanation of Every Known Atom in the Universe and Molecules: The Architecture of Everything. Consider the following timeline: 2001 – Physics grad students are getting hired in quantity by hedge funds to work on Wall St.