article thumbnail

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists.

Snapshot 107
article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

Over the years, data lakes on Amazon Simple Storage Service (Amazon S3) have become the default repository for enterprise data and are a common choice for a large set of users who query data for a variety of analytics and machine leaning use cases. Analytics use cases on data lakes are always evolving.

Data Lake 104
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

Tracking data changes and rollback Build your transactional data lake on AWS You can build your modern data architecture with a scalable data lake that integrates seamlessly with an Amazon Redshift powered cloud warehouse. Athena provides a simplified, flexible way to analyze petabytes of data where it lives.

Data Lake 105
article thumbnail

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure. Run the first cell to set up an AWS Glue interactive session.

Data Lake 115
article thumbnail

Estimating Scope 1 Carbon Footprint with Amazon Athena

AWS Big Data

In this blog, we will walk through how we can apply existing enterprise data to better understand and estimate Scope 1 carbon footprint using Amazon Simple Storage Service (S3) and Amazon Athena , a serverless interactive analytics service that makes it easy to analyze data using standard SQL.

article thumbnail

Exploring real-time streaming for generative AI Applications

AWS Big Data

Furthermore, data events are filtered, enriched, and transformed to a consumable format using a stream processor. The result is made available to the application by querying the latest snapshot. AWS Glue can interact with streaming data services such as Kinesis Data Streams and Amazon MSK for processing and transforming CDC data.

article thumbnail

How Amazon optimized its high-volume financial reconciliation process with Amazon EMR for higher scalability and performance

AWS Big Data

With these frameworks and related open-source projects, you can process data for analytics purposes and BI workloads. Sakti Mishra is a Principal Solutions Architect at AWS, where he helps customers modernize their data architecture and define their end-to-end data strategy, including data security, accessibility, governance, and more.