2022, Big Data, Data Integration and Data Lake

2022

Big Data

Data Integration

Data Lake

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Dive deep into AWS Glue 4.0 for Apache Spark

AWS Big Data

MAY 18, 2023

It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data architecture to break down data silos. About the authors Gonzalo Herreros is a Senior Big Data Architect on the AWS Glue team.

Testing

Testing Data Lake Cost-Benefit Data Integration

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

AWS Big Data

JUNE 28, 2023

Another example of AWS’s investment in zero-ETL is providing the ability to query a variety of data sources without having to worry about data movement. Data analysts and data engineers can use familiar SQL commands to join data across several data sources for quick analysis, and store the results in Amazon S3 for subsequent use.

Analytics

Analytics Data Warehouse Data Lake Data-driven

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

AWS Big Data

MARCH 20, 2023

In the first post of this series , we described how AWS Glue for Apache Spark works with Apache Hudi, Linux Foundation Delta Lake, and Apache Iceberg datasets tables using the native support of those data lake formats. For S3 URL , enter s3://noaa-ghcn-pds/csv/by_year/2022.csv. The data source is configured.

Visualization

Visualization Data Lake Snapshot Big Data

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets.

Data Lake

Data Lake Data Analytics Analytics Data Processing

What is Data Pipeline? A Detailed Explanation

Smart Data Collective

OCTOBER 17, 2022

Big data is shaping our world in countless ways. Data powers everything we do. Exactly why, the systems have to ensure adequate, accurate and most importantly, consistent data flow between different systems. A point of data entry in a given pipeline. While we are at it, a few tools are leading in 2022.

Data Warehouse

Data Warehouse Data Lake Visualization Big Data

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. Read: The first capability of a data fabric is a semantic knowledge data catalog, but what are the other 5 core capabilities of a data fabric? 3 March 2022.

Management

Management Metadata Data Architecture Data Lake

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. In 2022, AWS commissioned a study conducted by the American Productivity and Quality Center (APQC) to quantify the Business Value of Customer 360.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

How AWS helped Altron Group accelerate their vision for optimized customer engagement

AWS Big Data

JULY 13, 2023

Data-Driven Everything engagement Altron has provided information technology services since 1965 across South Africa, the Middle East, and Australia. Foundations for a data lake with data governance controls and data quality checks. A set of QuickSight dashboards to be consumed via browser and mobile.

Optimization

Optimization B2B Data Quality Sales

How data stores and governance impact your AI initiatives

IBM Big Data Hub

OCTOBER 12, 2023

They’re built on machine learning algorithms that create outputs based on an organization’s data or other third-party big data sources. Sometimes, these outputs are biased because the data used to train the model was incomplete or inaccurate in some way. Learn more about IBM watsonx 1.

Cost-Benefit

Cost-Benefit Metadata Data Governance Modeling

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

AWS Big Data

APRIL 20, 2023

In a modern data architecture, unified analytics enable you to access the data you need, whether it’s stored in a data lake or a data warehouse. AWS Glue provides an extensible architecture that enables users with different data processing use cases, and works well with Amazon Redshift.

Visualization

Visualization Data Warehouse Big Data Data Lake

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. Let’s say that this company is located in Europe and the data product must comply with the GDPR.

Data Lake

Data Lake Management Metrics Data Warehouse

Process price transparency data using AWS Glue

AWS Big Data

MAY 4, 2023

Phase 1 implementation of this regulation, which went into effect on July 1, 2022, requires that payors publish machine-readable files publicly for each plan that they offer. Krishna Maddileti is a Senior Solutions Architect on the AWS Data Lab team. Manish Kola is a Solutions Architect on the AWS Data Lab team.

Insurance

Insurance Publishing Cost-Benefit Data Lake

Data Leaders Brief

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Dive deep into AWS Glue 4.0 for Apache Spark

Webinars

Trending Sources

Load data incrementally from transactional data lakes to data warehouses

Webinars

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

What is Data Pipeline? A Detailed Explanation

Augmented data management: Data fabric versus data mesh

Create an end-to-end data strategy for Customer 360 on AWS

How AWS helped Altron Group accelerate their vision for optimized customer engagement

How data stores and governance impact your AI initiatives

Exploring new ETL and ELT capabilities for Amazon Redshift from the AWS Glue Studio visual editor

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Process price transparency data using AWS Glue

Stay Connected