2022, Snapshot, Statistics and Unstructured Data

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale. Create an Iceberg table and load the test data from Amazon S3 into the table.

Snapshot

Snapshot Data Lake Testing Strategy

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

Offers different query types , allowing to prioritize data freshness (Snapshot Query) or read performance (Read Optimized Query). Using column statistics , Iceberg offers efficient updates on tables that are sorted on a “key” column. Iceberg offers a Merge On Read strategy to enable fast writes.

Data Lake

Data Lake Metadata Optimization Statistics

Data Leaders Brief

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Choosing an open table format for your transactional data lake on AWS

Webinars

Stay Connected