2013, Modeling and Statistics - Data Leaders Brief

Why you should care about debugging machine learning models

O'Reilly on Data

DECEMBER 12, 2019

Not least is the broadening realization that ML models can fail. And that’s why model debugging, the art and science of understanding and fixing problems in ML models, is so critical to the future of ML. Because all ML models make mistakes, everyone who cares about ML should also care about model debugging. [1]

Machine Learning

Machine Learning Modeling Testing Risk Management

How Insurance Companies Use Data To Measure Risk And Choose Rates

Smart Data Collective

MARCH 27, 2020

Statistics show that married people have fewer car accidents than singletons. Insurance companies have access to crime statistics and can track the number of car theft and break-ins per neighborhood. Insurance companies have access to stats on what make and model of car is stolen more often or involved in more crashes.

Insurance

Insurance Measurement Risk Cost-Benefit

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Insight

MARCH 12, 2020

In 2013, Robert Galbraith?—?an The AIgent was built with BERT, Google’s state-of-the-art language model. In this article, I will discuss the construction of the AIgent, from data collection to model assembly. More relevant to the AIgent is Google’s BERT model, a task-agnostic (i.e. an aspiring author?—?finished

Modeling

Modeling Metadata Publishing Sales

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Data Drift Detection for Image Classifiers

Domino Data Lab

DECEMBER 1, 2019

This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production. Introduction: preventing silent model degradation in production. This article explores an approach that can be used to detect data drift for models that classify/score image data.

Modeling

Modeling Machine Learning Deep Learning Testing

Themes and Conferences per Pacoid, Episode 5

Domino Data Lab

JANUARY 6, 2019

I’ve been teaching data science since 2008 privately for employers – exec staff, investors, IT teams, and the data teams I’ve led – and since 2013, for industry professionals in general. Also, clearly there’s no “one size fits all” educational model for data science. The Berkeley model addresses large university needs in the US.

Data Science

Data Science Machine Learning Reporting Visualization

Towards optimal experimentation in online systems

The Unofficial Google Data Science Blog

APRIL 23, 2024

If $Y$ at that point is (statistically and practically) significantly better than our current operating point, and that point is deemed acceptable, we update the system parameters to this better value. Figure 2: Spreading measurements out makes estimates of model (slope of line) more accurate. And sometimes even if it is not[1].)

Experimentation

Experimentation Optimization Uncertainty Metrics

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

Domino Data Lab

APRIL 21, 2021

We’ll use a gradient boosting technique via XGBoost to create a model and I’ll walk you through steps you can take to avoid overfitting and build a model that is fit for purpose and ready for production. Let’s also look at the basic descriptive statistics for all attributes. from sklearn import metrics.

Statistics

Statistics Machine Learning Modeling Metrics

Build a RAG data ingestion pipeline for large-scale ML workloads

AWS Big Data

MARCH 13, 2024

For building any generative AI application, enriching the large language models (LLMs) with new data is imperative. For the model used to create embeddings, we settled on all-mpnet-base-v2 to create a 768-dimensional vector space. You will see the Ray dashboard and statistics of the jobs and cluster running.

Data Processing

Data Processing Dashboards Machine Learning Management

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Deep Learning Illustrated: Building Natural Language Processing Models

Domino Data Lab

AUGUST 22, 2019

The excerpt covers how to create word vectors and utilize them as an input into a deep learning model. While the field of computational linguistics, or Natural Language Processing (NLP), has been around for decades, the increased interest in and use of deep learning models has also propelled applications of NLP forward within industry.

Deep Learning

Deep Learning Modeling Metrics Testing

Why We Started the Data Intelligence Project

Alation

JULY 7, 2022

In 2013 I joined American Family Insurance as a metadata analyst. The Bureau of Labor Statistics projects the job outlook for data scientists to grow 22% from 2020 to 2030. The rapid growth of data roles critical to data-centric business models demonstrate an awareness of this need. Two data-driven careers.

Metadata

Metadata Data-driven Insurance Statistics

Great Storytelling With Data: Visualize Simply And Focus Obsessively

Occam's Razor

SEPTEMBER 21, 2015

First, someone worked really hard on this and created a really nice model for a smarter decision to be made for 2014. Second, between 2012 and 2013. When I present it, I'll say something like "Our peak investment, in Aquantive in 2013, was 700k." You are a Ninja, it will likely take you less. Rest is irrelevant.

Visualization

Visualization Key Performance Indicator Slice and Dice Strategy

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Sisense

DECEMBER 11, 2019

And he demonstrated how the Periscope Data platform overcomes the challenges of huge data volumes that can’t be easily modeled by traditional BI. Kongregate has been using Periscope Data since 2013. No surprise then that Tinder beat Netflix to become the highest-earning non-game app on both Google Play Store and the Apple Store.

Data Lake

Data Lake Big Data Sales Data-driven

Data Science at The New York Times

Domino Data Lab

JULY 9, 2019

Diving into examples of building and deploying ML models at The New York Times including the descriptive topic modeling-oriented Readerscope (audience insights engine), a prediction model regarding who was likely to subscribe/cancel their subscription, as well as prescriptive example via recommendations of highly curated editorial content.

Data Science

Data Science Machine Learning Advertising Modeling

Using DataOps to Drive Agility and Business Value

DataKitchen

JUNE 24, 2021

Chapin shared that even though GE had embraced agile practices since 2013, the company still struggled with massive amounts of legacy systems. Most companies have legacy models in software development that are well-oiled machines. Success Requires Focus on Business Outcomes, Benchmarking. Take a show-me approach.

ROI

ROI Metrics Measurement Cost-Benefit

5-Star Linked Open Elections Data

Ontotext

MARCH 24, 2021

For these reasons, we have applied semantic data integration and produced a coherent knowledge graph covering all Bulgarian elections from 2013 to the present day. A set of of sample queries is provided to help the understanding of the data model and shorten the learning curve. Easily accessible linked open elections data.

Statistics

Statistics Data Processing Publishing Metrics

Manipulating Data with dplyr

Domino Data Lab

MARCH 27, 2019

For example, you can calculate the average percentage of votes cast for Democratic Party candidates: # Compute summary statistics for the `presidentialElections` data frame average_votes <- summarize(. Using the summarize() function to calculate summary statistics for the presidentialElections data frame. Red notes are added.

Statistics

Statistics Data Science Visualization IT

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Companies like Tableau (which raised over $250 million when it had its IPO in 2013) demonstrated an unmet need in the market. These licensing terms are critical: Perpetual license vs subscription: Subscription is a pay-as-you-go model that provides flexibility as you evaluate a vendor. Their dashboards were visually stunning.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Data Leaders Brief

Why you should care about debugging machine learning models

How Insurance Companies Use Data To Measure Risk And Choose Rates

Webinars

Trending Sources

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Webinars

Data Drift Detection for Image Classifiers

Themes and Conferences per Pacoid, Episode 5

Towards optimal experimentation in online systems

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

Build a RAG data ingestion pipeline for large-scale ML workloads

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Deep Learning Illustrated: Building Natural Language Processing Models

Why We Started the Data Intelligence Project

Great Storytelling With Data: Visualize Simply And Focus Obsessively

Periscope Data Expands to Israel, Empowering Data Teams with Powerful Tools

Data Science at The New York Times

Using DataOps to Drive Agility and Business Value

5-Star Linked Open Elections Data

Manipulating Data with dplyr

What Is Embedded Analytics?

Stay Connected