article thumbnail

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

Iceberg creates snapshots for the table contents. Each snapshot is a complete set of data files in the table at a point in time. Data files in snapshots are stored in one or more manifest files that contain a row for each data file in the table, its partition data, and its metrics.

Snapshot 110
article thumbnail

How to Use Apache Iceberg in CDP’s Open Lakehouse

Cloudera

Time Travel: Reproduce a query as of a given time or snapshot ID, which can be used for historical audits and rollback of erroneous operations, as an example. 8 2001 5967780. We see that as of the first snapshot ( 7445571238522489274) we had data from the years 1995 to 2005 in the table. 1 2008 7009728. 2 2007 7453215.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Clean Harbors’ CIO: Hybrid approach to the cloud is a win-win

CIO Business Intelligence

“Our strategy in taking a hybrid approach has provided the agility we need to do advanced services in the cloud as we go through our digital transformation,” says Gabriel, who joined the company in 2001 and was promoted to executive vice president and CIO of Clean Harbors in 2018.

article thumbnail

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

Frequent materialized view refreshes on top of constantly changing base tables due to streamed data can lead to snapshot isolation errors. Also, the need to derive near-real-time insights within seconds requires frequent materialized view refreshes in this traditional relational database approach.