article thumbnail

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

Apache Iceberg manages these schema changes in a backward-compatible way through its innovative metadata table evolution architecture. With Lake Formation, you can manage fine-grained access control for your data lake data on Amazon S3 and its metadata in the Data Catalog. Iceberg maintains the table state in metadata files.

Snapshot 111
article thumbnail

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

Iceberg tables maintain metadata to abstract large collections of files, providing data management features including time travel, rollback, data compaction, and full schema evolution, reducing management overhead. Snowflake integrates with AWS Glue Data Catalog to retrieve the snapshot location.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

CRM’s Have a Big Data Technical Debt Problem: Here’s How to Fix It

Smart Data Collective

Metazoa is the company behind the Salesforce ecosystem’s top software toolset for org management, Metazoa Snapshot. Created in 2006, Snapshot was the first CRM management solution designed specifically for Salesforce and was one of the first Apps to be offered on the Salesforce AppExchange. What is technical debt anyway?

Big Data 131
article thumbnail

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

These formats enable ACID (atomicity, consistency, isolation, durability) transactions, upserts, and deletes, and advanced features such as time travel and snapshots that were previously only available in data warehouses. It will pre-populate the properties as shown in the following screenshot.

Snapshot 100
article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Apache Iceberg addresses customer needs by capturing rich metadata information about the dataset at the time the individual data files are created.

Data Lake 117
article thumbnail

Optimization Strategies for Iceberg Tables

Cloudera

Problem with too many snapshots Everytime a write operation occurs on an Iceberg table, a new snapshot is created. A bloated metadata.json file could increase both read/write times because a large metadata file needs to be read/written every time. You could also change the isolation level to snapshot isolation.

article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

This means the data files in the data lake aren’t modified during the migration and all Apache Iceberg metadata files (manifests, manifest files, and table metadata files) are generated outside the purview of the data. In this method, the metadata are recreated in an isolated environment and colocated with the existing data files.

Data Lake 102