Remove 2001 Remove Big Data Remove Metadata Remove Testing
article thumbnail

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

Apache Iceberg manages these schema changes in a backward-compatible way through its innovative metadata table evolution architecture. For instance, an ecommerce marketplace may initially partition order data by day. Lake Formation helps you centrally manage, secure, and globally share data for analytics and machine learning.

Snapshot 112
article thumbnail

Speed up queries with the cost-based optimizer in Amazon Athena

AWS Big Data

Starting today, the Athena SQL engine uses a cost-based optimizer (CBO), a new feature that uses table and column statistics stored in the AWS Glue Data Catalog as part of the table’s metadata. Testing on the TPC-DS benchmark showed an 11% improvement in overall query performance when using CBO compared to without it.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

It includes perspectives about current issues, themes, vendors, and products for data governance. My interest in data governance (DG) began with the recent industry surveys by O’Reilly Media about enterprise adoption of “ABC” (AI, Big Data, Cloud). We keep feeding the monster data. the flywheel effect.

article thumbnail

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

It shows a call center streaming data source that sends the latest call center feed in every 15 seconds. The second streaming data source constitutes metadata information about the call center organization and agents that gets refreshed throughout the day. We use two datasets in this post.

article thumbnail

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

I mention this here because there was a lot of overlap between current industry data governance needs and what the scientific community is working toward for scholarly infrastructure. The gist is, leveraging metadata about research datasets, projects, publications, etc., 2018 – Global reckoning about data governance, aka “Oops!