Remove 2022 Remove Snapshot Remove Statistics Remove Unstructured Data
article thumbnail

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale. Create an Iceberg table and load the test data from Amazon S3 into the table.

article thumbnail

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

Offers different query types , allowing to prioritize data freshness (Snapshot Query) or read performance (Read Optimized Query). Using column statistics , Iceberg offers efficient updates on tables that are sorted on a “key” column. Iceberg offers a Merge On Read strategy to enable fast writes.

Data Lake 113