Data Leaders Brief

tutorial how-to-check-if-a-file-exists-in-python

Extract data from SAP ERP using AWS Glue and the SAP SDK

AWS Big Data

FEBRUARY 8, 2023

In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK. This is a guest post by Siva Manickam and Prahalathan M from Vyaire Medical Inc. Vyaire Medical Inc. is a global company, headquartered in suburban Chicago, focused exclusively on supporting breathing through every stage of life.

Testing

Testing Data Integration Data Lake Enterprise

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

This post demonstrates how to apply CDC changes from Amazon Relational Database Service (Amazon RDS) or other relational databases to an S3 data lake, with flexibility to denormalize, transform, and enrich the data in near-real time. Data analytics on operational data at near-real time is becoming a common need.

Data Lake

Data Lake Visualization Dashboards Insurance

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

How to Build a Flexible Developer Documentation Portal

Sisense

SEPTEMBER 30, 2020

If you haven’t read how we overhauled our developer portal recently, check out our prior conversation with Moti Granovsky, Sisense’s Head of Developer Relations. Let’s kick off our journey into the rebuild by understanding what our requirements were and how we went about meeting them. Building instead of buying.

Data Processing

Data Processing Experimentation Cost-Benefit Marketing

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

NVIDIA RAPIDS in Cloudera Machine Learning

Cloudera

MAY 19, 2021

For more information see: < [link] > The RAPIDS libraries are designed as drop-in replacements for common Python data science libraries like pandas (cuDF), numpy (cuPy), sklearn (cuML) and dask (dask_cuda). In this tutorial, we will illustrate how RAPIDS can be used to tackle the Kaggle Home Credit Default Risk challenge.

Machine Learning

Machine Learning Data Science Data Lake Modeling

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

Answering questions as simple as “How many unique customers do we have?” We import an open-source fuzzy matching Python library to Amazon Redshift, create a simple fuzzy matching user-defined function (UDF), and then create a procedure that weights multiple columns in a table to find matches based on user input. An S3 bucket.

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

We also show how to use Kinesis Data Analytics Studio to test and tune your analysis before deploying your migrated applications. We also show how to use Kinesis Data Analytics Studio to test and tune your analysis before deploying your migrated applications.

Data Analytics

Data Analytics Analytics IoT Data Lake

Build data integration jobs with AI companion on AWS Glue Studio notebook powered by Amazon CodeWhisperer

AWS Big Data

JULY 26, 2023

AWS also announced the Amazon CodeWhisperer Jupyter extension to help Jupyter users by generating real-time, single-line, or full-function code suggestions for Python notebooks on Jupyter Lab and Amazon SageMaker Studio. Data is essential for businesses to make informed decisions, improve operations, and innovate.

Data Integration

Data Integration Interactive Machine Learning Big Data

Accelerating model velocity through Snowflake Java UDF integration

Domino Data Lab

JUNE 15, 2021

Data is no longer stored in CSV files, but in a dedicated, purpose built data lake / data warehouse. These companies often undertake large data science efforts in order to shift from “data-driven” to “model-driven” operations, and to provide model-underpinned insights to the business. Why Snowflake UDFs.

Modeling

Modeling Data Science Data-driven Data Warehouse

The importance of structure, coding style, and refactoring in notebooks

Domino Data Lab

JULY 1, 2020

Notebooks are increasingly crucial in the data scientist’s toolbox. Although considered relatively new, their history traces back to systems like Mathematica and MATLAB. For Data Scientists, spinning up notebook instances as the first step in exploratory data analysis has become second nature. Notebook Structure.

Testing

Testing Data Science Machine Learning Data-driven

Deep Learning Illustrated: Building Natural Language Processing Models

Domino Data Lab

AUGUST 22, 2019

The excerpt covers how to create word vectors and utilize them as an input into a deep learning model. Many thanks to Addison-Wesley Professional for providing the permissions to excerpt “Natural Language Processing” from the book, Deep Learning Illustrated by Krohn , Beyleveld , and Bassens. Introduction.

Deep Learning

Deep Learning Modeling Metrics Testing

Extract data from SAP ERP using AWS Glue and the SAP SDK

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Webinars

Trending Sources

How to Build a Flexible Developer Documentation Portal

Webinars

NVIDIA RAPIDS in Cloudera Machine Learning

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

Build data integration jobs with AI companion on AWS Glue Studio notebook powered by Amazon CodeWhisperer

Accelerating model velocity through Snowflake Java UDF integration

The importance of structure, coding style, and refactoring in notebooks

Deep Learning Illustrated: Building Natural Language Processing Models

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift