2012, Data Analytics, Data Lake and Testing

2012

Data Analytics

Data Lake

Testing

Measure performance of AWS Glue Data Quality for ETL pipelines

AWS Big Data

MARCH 12, 2024

In recent years, data lakes have become a mainstream architecture, and data quality validation is a critical factor to improve the reusability and consistency of the data. In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset.

Data Quality

Data Quality Measurement Testing Visualization

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

Founded in 2012, SumUp is the financial partner for more than 4 million small merchants in over 35 markets worldwide, helping them start, run and grow their business. Unless, of course, the rest of their data also resides in the Google Cloud. The Data Science teams also use this data for churn prediction and CLTV modeling.

Analytics

Analytics Data Lake Testing Optimization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

AWS Big Data

JANUARY 30, 2023

Many customers need an ACID transaction (atomic, consistent, isolated, durable) data lake that can log change data capture (CDC) from operational data sources. There is also demand for merging real-time data into batch data. Delta Lake framework provides these two capabilities. Choose Create policy.

Insurance

Insurance Data Lake Data-driven Management

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

AWS Big Data

APRIL 19, 2023

json ) to DynamoDB (for more information, refer to Write data to a table using the console or AWS CLI ): { "name": "step1.q", sample_data/us_current.csv s3://$s3_bucket_name/covid-19-testing-data/base/source_us_current/; Copy states_current.csv : aws s3 cp./sample_data/states_current.csv sample_oozie_job_name/step1/step1.json

Metadata

Metadata Testing Data Lake Consulting

Simplify and speed up Apache Spark applications on Amazon Redshift data with Amazon Redshift integration for Apache Spark

AWS Big Data

APRIL 20, 2023

For sales across multiple markets, the product sales data such as orders, transactions, and shipment data is available on Amazon S3 in the data lake. The data engineering team can use Apache Spark with Amazon EMR or AWS Glue to analyze this data in Amazon S3. enableHiveSupport().getOrCreate()

Data Lake

Data Lake Data Warehouse Sales Data-driven

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. He works with AWS customers to design and build real time data processing systems. Vishal Khatri is a Sr.

Data Warehouse

Data Warehouse Snapshot Data Processing Management

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

Use Lake Formation to grant permissions to users to access data. Test the solution by accessing data with a corporate identity. Audit user data access. On the Lake Formation console, choose Data lake permissions under Permissions in the navigation pane. Select Named Data Catalog resources.

Analytics

Analytics Data Lake Management Enterprise

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

Solution overview One of the common functionalities involved in data pipelines is extracting data from multiple data sources and exporting it to a data lake or synchronizing the data to another database. His core area of expertise include Technology Strategy, Data Analytics, and Data Science.

Metadata

Metadata Visualization Data Lake Data-driven

Run Spark SQL on Amazon Athena Spark

AWS Big Data

OCTOBER 23, 2023

Modern applications store massive amounts of data on Amazon Simple Storage Service (Amazon S3) data lakes, providing cost-effective and highly durable storage, and allowing you to run analytics and machine learning (ML) from your data lake to generate insights on your data.

Data Lake

Data Lake Visualization Optimization Interactive

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

2012: Amazon Redshift, the first of its kind cloud-based data warehouse service comes into existence. Google launches BigQuery, its own data warehousing tool and Microsoft introduces Azure SQL Data Warehouse and Azure Data Lake Store. Data lakes or data lake houses alone cannot solve the efficiency problem.

Data-driven

Data-driven IoT Unstructured Data Data Lake

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

Plus, we’re seeing how these issues surface in regulated environments, which have increasingly become target use cases for the popular open source projects used for data analytics infrastructure: Spark, Jupyter, Kafka, etc. Data governance, for the win! Or something. Nothing Spreads Like Fear”. No big deal.”. The Big Picture.

Data Science

Data Science Machine Learning Data Governance Statistics

Q&A with Greg Rahn – The changing Data Warehouse market

Cloudera

DECEMBER 12, 2018

And so I actually transitioned out of that group and into the Big Data Appliance group at Oracle, but soon realized that if that was what I wanted to keep doing, this up and coming company called Cloudera might be a better place to do it since these new technologies weren’t just a hobby at Cloudera. As you mentioned, Qlik is in there.

Data Warehouse

Data Warehouse Marketing Big Data Data Lake

Data Leaders Brief

Measure performance of AWS Glue Data Quality for ETL pipelines

How SumUp made digital analytics more accessible using AWS Glue

Webinars

Trending Sources

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

Webinars

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

Simplify and speed up Apache Spark applications on Amazon Redshift data with Amazon Redshift integration for Apache Spark

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

Run Spark SQL on Amazon Athena Spark

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

Themes and Conferences per Pacoid, Episode 12

Q&A with Greg Rahn – The changing Data Warehouse market

Stay Connected