Remove product_category planning
article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Iceberg takes advantage of the rich metadata it captures at write time and facilitates techniques such as scan planning, partitioning, pruning, and column-level stats such as min/max values to skip data files that don’t have match records. It first uses the manifest list, which acts as an index of the manifest files.

Data Lake 116
article thumbnail

Visualize Confluent data in Amazon QuickSight using Amazon Athena

AWS Big Data

Register the schema To connect Athena to Confluent, the connector needs the schema of the topic to be registered in the AWS Glue Schema Registry , which Athena uses for query planning. transactions" WHERE product_category='Kids' In the Connection details section, choose Create Lambda function. transactions_db"."transactions"

article thumbnail

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

During a planned or unplanned regional traffic disruption, failover controls let you control failover between buckets in different Regions and accounts within minutes. Run the following Spark commands in your PySpark notebook: df = spark.read.parquet("s3://amazon-reviews-pds/parquet/product_category=Electronics/*.parquet")