Remove optimization-strategies-for-iceberg-tables
article thumbnail

Optimization Strategies for Iceberg Tables

Cloudera

Introduction Apache Iceberg has recently grown in popularity because it adds data warehouse-like capabilities to your data lake making it easier to analyze all your data — structured and unstructured. However, you need to regularly maintain Iceberg tables to keep them in a healthy state so that read queries can perform faster.

article thumbnail

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

Each storage format implements this functionality in slightly different ways; for a comparison, refer to Choosing an open table format for your transactional data lake on AWS. In this post, we show you how to use Spark SQL in Amazon Athena notebooks and work with Iceberg, Hudi, and Delta Lake table formats.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

However, as data volumes continue to grow, optimizing data layout and organization becomes crucial for efficient querying and analysis. One approach is to use the Amazon Athena CREATE TABLE AS SELECT (CTAS) statement, which allows you to create a bucketed table directly from a query.

article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. It adds tables to compute engines including Spark, Trino, PrestoDB, Flink, and Hive using a high-performance table format that works just like a SQL table.

Data Lake 114
article thumbnail

From Hive Tables to Iceberg Tables: Hassle-Free

Cloudera

Introduction For more than a decade now, the Hive table format has been a ubiquitous presence in the big data ecosystem, managing petabytes of data with remarkable efficiency and scale. Therefore, Apache Iceberg table format is poised to replace the traditional Hive table format in the coming years.

article thumbnail

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

Table formats like Apache Iceberg provide solutions to these issues. You can simplify your data strategy by running multiple workloads and applications on the same data in the same location. Apache Iceberg overview Iceberg is an open-source table format that brings the power of SQL tables to big data files.

article thumbnail

Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena

AWS Big Data

In 2022, we announced that you can enforce fine-grained access control policies using AWS Lake Formation and query data stored in any supported file format using table formats such as Apache Iceberg , Apache Hudi, and more using Amazon Athena queries. Iceberg is an open table format for very large analytic datasets.