Internet of Things, Metadata and Snapshot

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

Iceberg tables store metadata in manifest files. As the number of data files increase, the amount of metadata stored in these manifest files also increases, leading to longer query planning time. The query runtime also increases because it’s proportional to the number of data or metadata file read operations.

Optimization

Optimization Snapshot Data Lake Metadata

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Big Data

JANUARY 8, 2024

Stream Processing – An application created with Amazon Managed Service for Apache Flink can read the records from the data stream to detect and clean any errors in the time series data and enrich the data with specific metadata to optimize operational analytics.

Analytics

Analytics IoT Data-driven Snapshot

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

This data can come from a diverse range of sources, including Internet of Things (IoT) devices, user applications, and logging and telemetry information from applications, to name a few. The items stored in checkpoint locations are mainly the metadata for application configurations and the state of processed offsets.

Management

Management Metadata Testing Internet of Things

Data Leaders Brief

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Webinars

Stay Connected