Data Integration, Metadata, Optimization and Snapshot

Data Integration

Metadata

Optimization

Snapshot

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

Despite their advantages, traditional data lake architectures often grapple with challenges such as understanding deviations from the most optimal state of the table over time, identifying issues in data pipelines, and monitoring a large number of tables.

Metadata

Metadata Snapshot Data Lake Metrics

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

AWS Big Data

APRIL 17, 2024

Amazon OpenSearch Service recently introduced the OpenSearch Optimized Instance family (OR1), which delivers up to 30% price-performance improvement over existing memory optimized instances in internal benchmarks, and uses Amazon Simple Storage Service (Amazon S3) to provide 11 9s of durability.

Optimization

Optimization Snapshot Metadata Cost-Benefit

Join 42,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Improving the Accuracy of Generative AI Systems: A Structured Approach

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Marketing Operations in 2025: A New Framework for Success

MORE WEBINARS

Trending Sources

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

This introduces the need for both polling and pushing the data to access and analyze in near-real time. From an operational standpoint, we designed a new shared responsibility model for data ingestion using AWS Glue instead of internal services (REST APIs) designed on Amazon EC2 to extract the data.

Optimization

Optimization Forecasting Data Lake Metadata

Webinars

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Improving the Accuracy of Generative AI Systems: A Structured Approach

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Marketing Operations in 2025: A New Framework for Success

MORE WEBINARS

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Hudi provides tables , transactions , efficient upserts and deletes , advanced indexes , streaming ingestion services , data clustering and compaction optimizations, and concurrency control , all while keeping your data in open source file formats. This post demonstrates how this new capability to crawl Hudi tables works.

Data Lake

Data Lake Snapshot Metadata Optimization

Apache HBase online migration to Amazon EMR

AWS Big Data

OCTOBER 23, 2024

Running HBase on Amazon S3 has several added benefits, including lower costs, data durability, and easier scalability. And during HBase migration, you can export the snapshot files to S3 and use them for recovery. HBase provided by other cloud platforms doesn’t support snapshots.

Snapshot

Snapshot Recreation/Entertainment Testing Data Processing

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

Figure 1: Apache Iceberg fits the next generation data architecture by abstracting storage layer from analytics layer while introducing net new capabilities like time-travel and partition evolution. #1: Apache Iceberg enables seamless integration between different streaming and processing engines while maintaining data integrity between them.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

Determining optimal table partitioning Determining optimal partitioning for each table is very important in order to optimize query performance and minimize the impact on teams querying the tables when partitioning changes. The following diagram illustrates the solution architecture. Orca addressed this in several ways.

Data Lake

Data Lake Analytics Snapshot Data Quality

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

Acting as a bridge between producer and consumer apps, it enforces the schema, reduces the data footprint in transit, and safeguards against malformed data. AWS Glue is an ideal solution for running stream consumer applications, discovering, extracting, transforming, loading, and integrating data from multiple sources.

Management

Management Metadata Testing Internet of Things

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

Cloudinary is a cloud-based media management platform that provides a comprehensive set of tools and services for managing, optimizing, and delivering images, videos, and other media assets on websites and mobile applications. This concept makes Iceberg extremely versatile.

Data Lake

Data Lake Metadata Snapshot Analytics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

The dbt-glue adapter democratized access for dbt users to data lakes, and enabled many users to effortlessly run their transformation workloads on the cloud with the serverless data integration capability of AWS Glue. The team uses dbt-glue to build a transformed gold model optimized for business intelligence (BI).

Data Lake

Data Lake Management Metrics Data Warehouse

Data Leaders Brief

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)

Webinars

Trending Sources

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Webinars

Introducing Apache Hudi support with AWS Glue crawlers

Apache HBase online migration to Amazon EMR

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Stay Connected