Interactive, Metadata, Publishing and Snapshot

Interactive

Metadata

Publishing

Snapshot

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

Visualize AWS Glue Data Quality scores in Amazon DataZone You can now visualize AWS Glue Data Quality scores in data assets that have been published in the Amazon DataZone business catalog and that are searchable through the Amazon DataZone web portal. We use this data source to import metadata information related to our datasets.

Data Quality

Data Quality Visualization Metadata Metrics

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Iceberg employs internal metadata management that keeps track of data and empowers a set of rich features at scale. The Data Catalog provides a central location to govern and keep track of the schema and metadata. Additionally, you can query in Athena based on the version ID of a snapshot in Iceberg.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

APRIL 17, 2024

Today, customers widely use OpenSearch Service for operational analytics because of its ability to ingest high volumes of data while also providing rich and interactive analytics. When the transfer is complete, the primary publishes new checkpoints to all replica copies, notifying them of a new segment being available for download.

Optimization

Optimization Snapshot Metadata Cost-Benefit

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

We introduce you to Amazon Managed Service for Apache Flink Studio and get started querying streaming data interactively using Amazon Kinesis Data Streams. Frequent materialized view refreshes on top of constantly changing base tables due to streamed data can lead to snapshot isolation errors.

Management

Management Metadata Analytics Dashboards

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Companies such as Adobe , Expedia , LinkedIn , Tencent , and Netflix have published blogs about their Apache Iceberg adoption for processing their large scale analytics datasets. . In Iceberg, instead of listing O(n) partitions (directory listing at runtime) in a table for query planning, Iceberg performs an O(1) RPC to read the snapshot.

Snapshot

Snapshot Metadata Cost-Benefit Data Architecture

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

We will publish follow up blogs for other data services. Every table change creates an Iceberg snapshot, this helps to resolve concurrency issues and allows readers to scan a stable table state every time. CDW separates the compute (Virtual Warehouses) and metadata (DB catalogs) by running them in independent Kubernetes pods.

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

AWS has invested in native service integration with Apache Hudi and published technical contents to enable you to use Apache Hudi with AWS Glue (for example, refer to Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started ).

Data Lake

Data Lake Data Processing Metadata Snapshot

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

After the processed data is stored in Amazon S3, we create an AWS Glue crawler to create a Data Catalog table that acts as a metadata layer for the data. The table can be queried using Amazon Athena , a serverless, interactive query service that enables running SQL-like queries on data stored in Amazon S3.

Management

Management Metadata Testing Internet of Things

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

In 2022, AWS published a dbt adapter called dbt-glue —the open source, battle-tested dbt AWS Glue adapter that allows data engineers to use dbt for cloud-based data lakes along with data warehouses and databases, paying for just the compute they need. The following diagram illustrates the architecture.

Data Lake

Data Lake Management Metrics Data Warehouse

Data Leaders Brief

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

Webinars

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Introducing Apache Iceberg in Cloudera Data Platform

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Stay Connected