2013, Machine Learning and Testing

2013

Machine Learning

Testing

Why you should care about debugging machine learning models

O'Reilly on Data

DECEMBER 12, 2019

For all the excitement about machine learning (ML), there are serious impediments to its widespread adoption. In addition to newer innovations, the practice borrows from model risk management, traditional model diagnostics, and software testing. Not least is the broadening realization that ML models can fail. ML security audits.

Machine Learning

Machine Learning Modeling Testing Risk Management

DataKitchen’s 2020 Honors & Awards

DataKitchen

DECEMBER 30, 2020

In June of 2020, Database Trends & Applications featured DataKitchen’s end-to-end DataOps platform for its ability to coordinate data teams, tools, and environments in the entire data analytics organization with features such as meta-orchestration , automated testing and monitoring , and continuous deployment : DataKitchen [link].

Testing

Testing Big Data Statistics Manufacturing

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

How Wallapop improved performance of analytics workloads with Amazon Redshift Serverless and data sharing

AWS Big Data

NOVEMBER 14, 2023

Wallapop’s initial data architecture platform Wallapop is a Spanish ecommerce marketplace company focused on second-hand items, founded in 2013. Since its creation in 2013, it has reached more than 40 million downloads and more than 700 million products have been listed. The marketplace can be accessed via mobile app or website.

Data Warehouse

Data Warehouse Analytics Testing Cost-Benefit

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

FINRA CIO Steve Randich pushes the public cloud forward

CIO Business Intelligence

FEBRUARY 10, 2023

But for two years, we were testing limits within the public cloud.” Randich, who came to FINRA.org in 2013 after stints as co-CIO of Citigroup and former CIO of Nasdaq, is no stranger to the public cloud. “We spent about a year and a half going through several bottlenecks, taking them out one at a time with Amazon engineers.

Unstructured Data

Unstructured Data Data Lake Machine Learning Enterprise

Overcoming Common Challenges in Natural Language Processing

Sisense

MAY 26, 2020

While training a model for NLP, words not present in the training data commonly appear in the test data. Because of this, predictions made using test data may not be correct. To solve this problem, machines need to capture the semantic meaning of words. Test data then contains this sentence: Pasta is delicious.

Unstructured Data

Unstructured Data Big Data Testing Machine Learning

Operationalizing responsible AI principles for defense

IBM Big Data Hub

FEBRUARY 22, 2024

Reliable “The Department’s AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire life cycles.” This is misguided. But it is well worth the effort.

Metadata

Metadata Measurement Risk Modeling

Build a RAG data ingestion pipeline for large-scale ML workloads

AWS Big Data

MARCH 13, 2024

RAG is a machine learning (ML) architecture that uses external documents (like Wikipedia) to augment its knowledge and achieve state-of-the-art results on knowledge-intensive tasks. He entered the big data space in 2013 and continues to explore that area. This is where the Retrieval Augmented Generation (RAG) technique comes in.

Data Processing

Data Processing Dashboards Machine Learning Management

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Schrodinger’s Automation in AI and the Automation Bias

Jen Stirrup

FEBRUARY 1, 2023

The effects of AI will be magnified in the coming decade as manufacturing, retailing, transportation, finance, health care, law, advertising, insurance, entertainment, education, and virtually every other industry transform their core processes and business models to take advantage of machine learning.

Recreation/Entertainment

Recreation/Entertainment Testing Advertising Insurance

Data Drift Detection for Image Classifiers

Domino Data Lab

DECEMBER 1, 2019

In the context of machine learning, we consider data drift 1 to be the change in model input data that leads to a degradation of model performance. Step 4: Generate the test, train and noisy MNIST data sets. Generate the train and test sets (x_train, _), (x_test, _) = mnist.load_data() x_train = x_train.astype('float32') / 255.

Modeling

Modeling Machine Learning Deep Learning Testing

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

Domino Data Lab

APRIL 21, 2021

In this article, we’ll discuss the challenge organizations face around fraud detection, how machine learning can be used to identify and spot anomalies that the human eye might not catch. This is to prevent any information leakage into our test set. 2f%% of the test set." 2f%% of the test set."

Statistics

Statistics Machine Learning Modeling Metrics

PODCAST: COVID19 | Redefining Digital Enterprises – Episode 12: How AI is rapidly transforming the enterprise landscape in the post-COVID world

bridgei2i

JULY 28, 2020

It is my immense pleasure to introduce you all to our guest today Ria Persad, she’s named as international woman of the year by Renewable Energy World in power engineering in 2013 and the lifetime achievement leader by Platts Global Energy awards in 2014. We need people who can test.

Enterprise

Enterprise Digital Transformation Insurance B2B

The Value of Data for Philanthropy

Cloudera

AUGUST 6, 2018

For example, Crisis Text Line , which provides online support to people in crisis, received a total of 8 m illion text messages in the first two years of its existence between 2013 and 2015. Fox Foundation is testing a watch-type wearable device in Australia to continuously monitor the symptoms of patients with Parkinson’s disease.

Machine Learning

Machine Learning Internet of Things Cost-Benefit Data-driven

Dresner’s Point: Don’t Overlook the Zigzagging of Collaboration & Text Analytics

Howard Dresner

FEBRUARY 11, 2014

Collaboration BI At one of my weekly #BIWisdom tweetchats this month, collaboration, social media and text analytics turned up in a discussion about 2013 BI predictions that didn’t pan out. Vendors need to automate and decrease that effort.” • “I tested a social analytics tool; I was less than impressed.

Analytics

Analytics Business Intelligence Data Processing Marketing

Towards optimal experimentation in online systems

The Unofficial Google Data Science Blog

APRIL 23, 2024

Multiparameter experiments, however, generate richer data than standard A/B tests, and automated t-tests alone are insufficient to analyze them well. We use PrePost in most of our A/B tests, so we have pre-experiment metric measurements readily available that we can use as covariates in our models. Springer Netherlands, 2013. [16]

Experimentation

Experimentation Optimization Uncertainty Metrics

Data Science at The New York Times

Domino Data Lab

JULY 9, 2019

Wiggins advocated that data scientists find problems that impact the business; re-frame the problem as a machine learning (ML) task; execute on the ML task; and communicate the results back to the business in an impactful way. I still believe that data science is the craft of trying to apply machine learning to some real world problem.

Data Science

Data Science Machine Learning Advertising Modeling

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

2013: Google launches Google Compute Engine (IaaS), its own version of EC2. Microsoft launches Azure ML Studio for machine learning capabilities on the cloud. AWS rolls out SageMaker, designed to build, train, test and deploy machine learning (ML) models. Google releases Kubernetes.

Data-driven

Data-driven IoT Unstructured Data Data Lake

Using Empirical Bayes to approximate posteriors for large "black box" estimators

The Unofficial Google Data Science Blog

NOVEMBER 4, 2015

by OMKAR MURALIDHARAN Many machine learning applications have some kind of regression at their core, so understanding large-scale regression systems is important. But most common machine learning methods don’t give posteriors, and many don’t have explicit probability models. Figure 4 shows the results of such a test.

KDD

KDD Testing Machine Learning Modeling

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Insight

MARCH 12, 2020

In 2013, Robert Galbraith?—?an I tested several different flavors of BERT for use as synopsis classifiers before settling on the DistilBERT model from Hugging Face. On my test set, this approach resulted in~75–95% accuracy and ~.65 Test Case: Dune Let’s see an example of genre tag prediction in action.

Modeling

Modeling Metadata Publishing Sales

Deep Learning Illustrated: Building Natural Language Processing Models

Domino Data Lab

AUGUST 22, 2019

word2vec is an unsupervised learning technique—that is, it is applied to a corpus of natural language without making use of any labels that may or may not happen to exist for the corpus. Note: A test set of 19,500 such analogies was developed by Tomas Mikolov and his colleagues in their 2013 word2vec paper. Note: Mikolov, T.,

Deep Learning

Deep Learning Modeling Metrics Testing

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Companies like Tableau (which raised over $250 million when it had its IPO in 2013) demonstrated an unmet need in the market. Later on, you’ll appreciate being able to test ideas and leverage best practices as your needs evolve. Users’ varied needs require a shift in traditional BI thinking. Their dashboards were visually stunning.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Data Leaders Brief

Why you should care about debugging machine learning models

DataKitchen’s 2020 Honors & Awards

Webinars

Trending Sources

How Wallapop improved performance of analytics workloads with Amazon Redshift Serverless and data sharing

Webinars

FINRA CIO Steve Randich pushes the public cloud forward

Overcoming Common Challenges in Natural Language Processing

Operationalizing responsible AI principles for defense

Build a RAG data ingestion pipeline for large-scale ML workloads

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Schrodinger’s Automation in AI and the Automation Bias

Data Drift Detection for Image Classifiers

Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

PODCAST: COVID19 | Redefining Digital Enterprises – Episode 12: How AI is rapidly transforming the enterprise landscape in the post-COVID world

The Value of Data for Philanthropy

Dresner’s Point: Don’t Overlook the Zigzagging of Collaboration & Text Analytics

Towards optimal experimentation in online systems

Data Science at The New York Times

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

Using Empirical Bayes to approximate posteriors for large "black box" estimators

The AIgent: Using Google’s BERT Language Model to Connect Writers & Representation

Deep Learning Illustrated: Building Natural Language Processing Models

What Is Embedded Analytics?

Stay Connected