article thumbnail

Defining Simplicity for Enterprise Software as “a 10 Year Old Can Demo it”

Cloudera

During the development of Operational Database and Replication Manager, I kept telling folks across the team it has to be “so simple that a 10 year old can demo it”. so simple that a 10 year old can demo it”. Watch this: Enterprise Software that is so easy a 10 year old can demo it. Create a snapshot .

article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

An in-place migration can be performed in either of two ways: Using add_files : This procedure adds existing data files to an existing Iceberg table with a new snapshot that includes the files. Unlike migrate or snapshot, add_files can import files from a specific partition or partitions and doesn’t create a new Iceberg table.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Optimization Strategies for Iceberg Tables

Cloudera

Problem with too many snapshots Everytime a write operation occurs on an Iceberg table, a new snapshot is created. Regularly expiring snapshots is recommended to delete data files that are no longer needed, and to keep the size of table metadata small. You could also change the isolation level to snapshot isolation.

article thumbnail

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

AWS Big Data

For Target database , enter lf-demo-db. In the Athena query editor, run the following SELECT query on the shared table: SELECT * FROM "lf-demo-db"."consumer_iceberg" In the Athena query editor, run the following SELECT query on the shared table: SELECT * FROM "lf-demo-db"."consumer_iceberg" Choose Create new filter. Choose Save.

article thumbnail

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

Whenever there is an update to the Iceberg table, a new snapshot of the table is created, and the metadata pointer points to the current table metadata file. At the top of the hierarchy is the metadata file, which stores information about the table’s schema, partition information, and snapshots. impl":"org.apache.iceberg.aws.s3.S3FileIO"

Data Lake 114
article thumbnail

From Hive Tables to Iceberg Tables: Hassle-Free

Cloudera

They also provide a “ snapshot” procedure that creates an Iceberg table with a different name with the same underlying data. You could first create a snapshot table, run sanity checks on the snapshot table, and ensure that everything is in order. It includes a live demo recording of Iceberg capabilities.

article thumbnail

Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift

AWS Big Data

For Integration identifier , enter a name, for example zero-etl-demo. CREATE DATABASE aurora_pg_zetl FROM INTEGRATION ' ' DATABASE zeroetl_db; The integration is now complete, and an entire snapshot of the source will reflect as is in the destination. Choose Create zero-ETL integration. Choose Next.