Data Leaders Brief

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Iceberg takes advantage of the rich metadata it captures at write time and facilitates techniques such as scan planning, partitioning, pruning, and column-level stats such as min/max values to skip data files that don’t have match records. It first uses the manifest list, which acts as an index of the manifest files.

Data Lake

Data Lake Data Processing Metadata Snapshot

Visualize Confluent data in Amazon QuickSight using Amazon Athena

AWS Big Data

MARCH 27, 2023

Register the schema To connect Athena to Confluent, the connector needs the schema of the topic to be registered in the AWS Glue Schema Registry , which Athena uses for query planning. transactions" WHERE product_category='Kids' In the Connection details section, choose Create Lambda function. transactions_db"."transactions"

Visualization

Visualization Data Lake Interactive Data-driven

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

During a planned or unplanned regional traffic disruption, failover controls let you control failover between buckets in different Regions and accounts within minutes. Run the following Spark commands in your PySpark notebook: df = spark.read.parquet("s3://amazon-reviews-pds/parquet/product_category=Electronics/*.parquet")

Data Lake

Data Lake Snapshot Metadata Optimization

Data Leaders Brief

Use Apache Iceberg in a data lake to support incremental data processing

Visualize Confluent data in Amazon QuickSight using Amazon Athena

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Webinars

Stay Connected