2022, Metadata, Snapshot and Unstructured Data

Search:

DAY

WEEK

MONTH

YEAR

Select your country:
Sign up | Log in

2022

Metadata

Snapshot

Unstructured Data

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

AWS Big Data

JULY 3, 2023

Terminology Let’s first discuss some of the terminology used in this post: Research data lake on Amazon S3 – A data lake is a large, centralized repository that allows you to manage all your structured and unstructured data at any scale. This is where the tagging feature in Apache Iceberg comes in handy.

Snapshot

Snapshot Data Lake Testing Strategy

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

Iceberg doesn’t optimize file sizes or run automatic table services (for example, compaction or clustering) when writing, so streaming ingestion will create many small data and metadata files. Offers different query types , allowing to prioritize data freshness (Snapshot Query) or read performance (Read Optimized Query).

Data Lake

Data Lake Metadata Optimization Statistics

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Data Leaders Brief

Backtesting index rebalancing arbitrage with Amazon EMR and Apache Iceberg

Use Apache Iceberg in a data lake to support incremental data processing

Choosing an open table format for your transactional data lake on AWS

Webinars

Stay Connected