2023, Management, Metadata and Snapshot

2023

Management

Metadata

Snapshot

Amazon OpenSearch Service H1 2023 in review

AWS Big Data

AUGUST 23, 2023

Since its release in January 2021, the OpenSearch project has released 14 versions through June 2023. With OpenSearch Service managed domains, you specify a hardware configuration and OpenSearch Service provisions the required hardware and takes care of software patching, failure recovery, backups, and monitoring.

Snapshot

Snapshot Dashboards Visualization Metrics

Use Amazon Athena with Spark SQL for your open-source transactional table formats

AWS Big Data

JANUARY 24, 2024

These formats enable ACID (atomicity, consistency, isolation, durability) transactions, upserts, and deletes, and advanced features such as time travel and snapshots that were previously only available in data warehouses. For more information, refer to Amazon S3: Allows read and write access to objects in an S3 Bucket.

Snapshot

Snapshot Data Lake Metadata Optimization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Trending Sources

Analytics Vidhya

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

Apache Iceberg enables transactions on data lakes and can simplify data storage, management, ingestion, and processing. This means the data files in the data lake aren’t modified during the migration and all Apache Iceberg metadata files (manifests, manifest files, and table metadata files) are generated outside the purview of the data.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

To learn more about how to create an EMR cluster with Iceberg and use Amazon EMR Studio, refer to Use an Iceberg cluster with Spark and the Amazon EMR Studio Management Guide , respectively. In that case, we have to query the table with the snapshot-id corresponding to the deleted row. parquet") df.sortWithinPartitions("review_date").writeTo("dev.db.amazon_reviews_iceberg").append()

Data Lake

Data Lake Snapshot Metadata Optimization

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

AWS Big Data

OCTOBER 3, 2023

Iceberg tables store metadata in manifest files. As the number of data files increase, the amount of metadata stored in these manifest files also increases, leading to longer query planning time. The query runtime also increases because it’s proportional to the number of data or metadata file read operations.

Optimization

Optimization Snapshot Data Lake Metadata

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Apache Iceberg addresses customer needs by capturing rich metadata information about the dataset at the time the individual data files are created.

Data Lake

Data Lake Data Processing Metadata Snapshot

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

DataKitchen

AUGUST 8, 2023

On 20 July 2023, Gartner released the article “ Innovation Insight: Data Observability Enables Proactive Data Quality ” by Melody Chien. She sees Data Observability as an emerging technology in data engineering and management. It alerts data and analytics leaders to issues with their data before they multiply.

Data Quality

Data Quality Testing Snapshot Reporting

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Additionally, the task of maintaining and managing files in the data lake can be tedious and sometimes complex. They enable transactions on top of data lakes and can simplify data storage, management, ingestion, and processing. The Data Catalog provides a central location to govern and keep track of the schema and metadata.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

AWS Big Data

AUGUST 16, 2023

Iceberg manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. We fetch the metadata of the users_xxxxxx table from Athena. It’s imperative that the source and target metadata match.

Data Lake

Data Lake Metadata Testing Snapshot

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

AWS Big Data

NOVEMBER 6, 2023

Amazon Managed Workflow for Apache Airflow (Amazon MWAA) is a managed service that allows you to use a familiar Apache Airflow environment with improved scalability, availability, and security to enhance and scale your business workflows without the operational burden of managing the underlying infrastructure.

Metrics

Metrics Metadata Snapshot Management

Data Leaders Brief

Amazon OpenSearch Service H1 2023 in review

Use Amazon Athena with Spark SQL for your open-source transactional table formats

Webinars

Trending Sources

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Apache Iceberg optimization: Solving the small files problem in Amazon EMR

Use Apache Iceberg in a data lake to support incremental data processing

A Summary Of Gartner’s Recent Innovation Insight Into Data Observability

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

Stay Connected