Remove Data Processing Remove Metadata Remove Snapshot Remove Workshop
article thumbnail

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

This data is sent to Apache Kafka, which is hosted on Amazon Managed Streaming for Apache Kafka (Amazon MSK). Expiring old snapshots – This operation provides a way to remove outdated snapshots and their associated data files, enabling Orca to maintain low storage costs.

article thumbnail

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

The AWS Glue Data Catalog provides a uniform repository where disparate systems can store and find metadata to keep track of data in data silos. With unified metadata, both data processing and data consuming applications can access the tables using the same metadata. For metadata read/write, Flink has the catalog interface.