article thumbnail

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale. Create an Iceberg table and load the test data from Amazon S3 into the table.

article thumbnail

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

To overcome these issues, Orca decided to build a data lake. A data lake is a centralized data repository that enables organizations to store and manage large volumes of structured and unstructured data, eliminating data silos and facilitating advanced analytics and ML on the entire data.