Remove Business Intelligence Remove Metadata Remove Snapshot Remove Testing
article thumbnail

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

This authority extends across realms such as business intelligence, data engineering, and machine learning thus limiting the tools and capabilities that can be used. When evolving such a partition definition, the data in the table prior to the change is unaffected, as is its metadata.

Data Lake 108
article thumbnail

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

Amazon S3 allows you to access diverse data sets, build business intelligence dashboards, and accelerate the consumption of data by adopting a modern data architecture or data mesh pattern on Amazon Web Services (AWS). In this method, the metadata are recreated in an isolated environment and colocated with the existing data files.

Data Lake 102
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

Frequent materialized view refreshes on top of constantly changing base tables due to streamed data can lead to snapshot isolation errors. Datasets used for generating insights are curated using materialized views inside the database and published for business intelligence (BI) reporting. We use two datasets in this post.

article thumbnail

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

Building a starter version of anything can often be straightforward, but building something with enterprise-grade scale, security, resiliency, and performance typically requires knowledge and adherence to battle-tested best practices, and using the right tools and features in the right scenario. system implemented with Amazon Redshift.

article thumbnail

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

It is crucial that you perform testing to ensure that a table format meets your specific use case requirements. Iceberg doesn’t optimize file sizes or run automatic table services (for example, compaction or clustering) when writing, so streaming ingestion will create many small data and metadata files.

Data Lake 113
article thumbnail

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

In 2022, AWS published a dbt adapter called dbt-glue —the open source, battle-tested dbt AWS Glue adapter that allows data engineers to use dbt for cloud-based data lakes along with data warehouses and databases, paying for just the compute they need. 05:34:22 Connection test: [OK connection ok] 05:34:22 All checks passed!

Data Lake 102
article thumbnail

Materialized Views in Hive for Iceberg Table Format

Cloudera

Materialized views are valuable for accelerating common classes of business intelligence (BI) queries that consist of joins, group-bys and aggregate functions. The snapshotId of the source tables involved in the materialized view are also maintained in the metadata. Furthermore, it is partitioned on the d_year column.