Remove 2022 Remove Data Lake Remove Snapshot Remove Unstructured Data
article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. In early 2022, AWS announced general availability of Athena ACID transactions, powered by Apache Iceberg. and later supports the Apache Iceberg framework for data lakes.

Data Lake 118
article thumbnail

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale. This is where the tagging feature in Apache Iceberg comes in handy.

article thumbnail

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake 113