2012, Data Lake and Testing - Data Leaders Brief

2012

Data Lake

Testing

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

AWS Big Data

APRIL 26, 2024

Use Lake Formation to grant permissions to users to access data. Test the solution by accessing data with a corporate identity. Audit user data access. On the Lake Formation console, choose Data lake permissions under Permissions in the navigation pane. Select Named Data Catalog resources.

Analytics

Analytics Data Lake Management Enterprise

Measure performance of AWS Glue Data Quality for ETL pipelines

AWS Big Data

MARCH 12, 2024

In recent years, data lakes have become a mainstream architecture, and data quality validation is a critical factor to improve the reusability and consistency of the data. In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset.

Data Quality

Data Quality Measurement Testing Visualization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

MORE WEBINARS

Trending Sources

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

AWS Big Data

FEBRUARY 15, 2024

You might be modernizing your data architecture using Amazon Redshift to enable access to your data lake and data in your data warehouse, and are looking for a centralized and scalable way to define and manage the data access based on IdP identities. For IAM role , choose a Lake Formation user-defined role.

Management

Management Data Lake Sales Data Warehouse

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

MORE WEBINARS

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

Founded in 2012, SumUp is the financial partner for more than 4 million small merchants in over 35 markets worldwide, helping them start, run and grow their business. Unless, of course, the rest of their data also resides in the Google Cloud. The Data Science teams also use this data for churn prediction and CLTV modeling.

Analytics

Analytics Data Lake Testing Optimization

Federate Amazon QuickSight access with open-source identity provider Keycloak

AWS Big Data

JUNE 13, 2023

Test the application Let’s invoke the application you have created to seamlessly sign in to QuickSight using the following URL. Vamsi Bhadriraju is a Data Architect at AWS. He works closely with enterprise customers to build data lakes and analytical applications on the AWS Cloud.

Metadata

Metadata Dashboards Business Intelligence Management

Run Spark SQL on Amazon Athena Spark

AWS Big Data

OCTOBER 23, 2023

Modern applications store massive amounts of data on Amazon Simple Storage Service (Amazon S3) data lakes, providing cost-effective and highly durable storage, and allowing you to run analytics and machine learning (ML) from your data lake to generate insights on your data.

Data Lake

Data Lake Visualization Optimization Interactive

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

AWS Big Data

JANUARY 30, 2023

Many customers need an ACID transaction (atomic, consistent, isolated, durable) data lake that can log change data capture (CDC) from operational data sources. There is also demand for merging real-time data into batch data. Delta Lake framework provides these two capabilities. Choose Create policy.

Insurance

Insurance Data Lake Data-driven Management

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Big Data

DECEMBER 18, 2023

Solution overview One of the common functionalities involved in data pipelines is extracting data from multiple data sources and exporting it to a data lake or synchronizing the data to another database. Choose the workflow named ETL_Process. Run the workflow with default input.

Metadata

Metadata Visualization Data Lake Data-driven

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

AWS Big Data

APRIL 19, 2023

json ) to DynamoDB (for more information, refer to Write data to a table using the console or AWS CLI ): { "name": "step1.q", sample_data/us_current.csv s3://$s3_bucket_name/covid-19-testing-data/base/source_us_current/; Copy states_current.csv : aws s3 cp./sample_data/states_current.csv sample_oozie_job_name/step1/step1.json

Metadata

Metadata Testing Data Lake Consulting

Simplify and speed up Apache Spark applications on Amazon Redshift data with Amazon Redshift integration for Apache Spark

AWS Big Data

APRIL 20, 2023

For sales across multiple markets, the product sales data such as orders, transactions, and shipment data is available on Amazon S3 in the data lake. The data engineering team can use Apache Spark with Amazon EMR or AWS Glue to analyze this data in Amazon S3. enableHiveSupport().getOrCreate()

Data Lake

Data Lake Data Warehouse Sales Data-driven

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. We deploy Debezium MySQL source Kafka connector on Amazon MSK Connect.

Data Warehouse

Data Warehouse Snapshot Data Processing Management

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Somehow, the gravity of the data has a geological effect that forms data lakes. Also, data science workflows begin to create feedback loops from the big data side of the illo above over to the DW side. DG emerges for the big data side of the world, e.g., the Alation launch in 2012.

Data Governance

Data Governance Machine Learning Metadata Big Data

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

AUGUST 8, 2019

Once upon a time, circa 2012-ish, data science conferences were replete with talks about an industry hellbent on loading amazing enormous Big Data into some kind of data lake, and applying all kinds of odd astrophysics-ish approaches…for eventual PROFIT! Or something. Nothing Spreads Like Fear”. No big deal.”.

Data Science

Data Science Machine Learning Data Governance Statistics

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

BizAcuity

MAY 10, 2022

2012: Amazon Redshift, the first of its kind cloud-based data warehouse service comes into existence. Google launches BigQuery, its own data warehousing tool and Microsoft introduces Azure SQL Data Warehouse and Azure Data Lake Store. Data lakes or data lake houses alone cannot solve the efficiency problem.

Data-driven

Data-driven IoT Unstructured Data Data Lake

Q&A with Greg Rahn – The changing Data Warehouse market

Cloudera

DECEMBER 12, 2018

And so I actually transitioned out of that group and into the Big Data Appliance group at Oracle, but soon realized that if that was what I wanted to keep doing, this up and coming company called Cloudera might be a better place to do it since these new technologies weren’t just a hobby at Cloudera. As you mentioned, Qlik is in there.

Data Warehouse

Data Warehouse Marketing Big Data Data Lake

Use your corporate identities for analytics with Amazon EMR and AWS IAM Identity Center

Measure performance of AWS Glue Data Quality for ETL pipelines

Webinars

Trending Sources

Simplify access management with Amazon Redshift and AWS Lake Formation for users in an External Identity Provider

Webinars

How SumUp made digital analytics more accessible using AWS Glue

Federate Amazon QuickSight access with open-source identity provider Keycloak

Run Spark SQL on Amazon Athena Spark

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

Simplify and speed up Apache Spark applications on Amazon Redshift data with Amazon Redshift integration for Apache Spark

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Themes and Conferences per Pacoid, Episode 8

Themes and Conferences per Pacoid, Episode 12

How The Cloud Made ‘Data-Driven Culture’ Possible | Part 1

Q&A with Greg Rahn – The changing Data Warehouse market

Stay Connected