Remove Data Lake Remove Experimentation Remove Snapshot Remove Strategy
article thumbnail

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. availability. You still need to set appropriate EMRFS retries to provide additional resiliency.

article thumbnail

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

Backtesting is a process used in quantitative finance to evaluate trading strategies using historical data. This helps traders determine the potential profitability of a strategy and identify any risks associated with it, enabling them to optimize it for better performance.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

article thumbnail

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

The utility for cloning and experimentation is available in the open-sourced GitHub repository. This solution only replicates metadata in the Data Catalog, not the actual underlying data. This ensures that the data lake will still be functional in another Region if Lake Formation has an availability issue.

article thumbnail

Interview with Dominic Sartorio, Senior Vice President for Products & Development, Protegrity

Corinium

Most of my days focus on understanding what’s happening in the market, defining overall product strategy and direction, and translating into execution across the various teams. Then when there is a breach, it comes as a shock, “wow, I didn’t even know that application had access to so much sensitive data”. And then there is the Cloud.

Insurance 150
article thumbnail

Accelerating revenue growth with real-time analytics: Poshmark’s journey

AWS Big Data

The data from the Kinesis data stream is consumed by two applications: A Spark streaming application on Amazon EMR is used to write data from the Kinesis data stream to a data lake hosted on Amazon Simple Storage Service (Amazon S3) in a partitioned way.

article thumbnail

Unleashing the power of Presto: The Uber case study

IBM Big Data Hub

Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. They stood up a file-based data lake alongside their analytical database. Because much of the work done on their data lake is exploratory in nature, many users want to execute untested queries on petabytes of data.

OLAP 94