article thumbnail

Speed up queries with the cost-based optimizer in Amazon Athena

AWS Big Data

Starting today, the Athena SQL engine uses a cost-based optimizer (CBO), a new feature that uses table and column statistics stored in the AWS Glue Data Catalog as part of the table’s metadata. Analytics Architect on Amazon Athena. This means a 3 TB benchmark dataset accurately represents customer workloads on 30–50 TB datasets.

article thumbnail

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

In this post, we discuss ways to modernize your legacy, on-premises, real-time analytics architecture to build serverless data analytics solutions on AWS using Amazon Managed Service for Apache Flink. It shows a call center streaming data source that sends the latest call center feed in every 15 seconds.

article thumbnail

Themes and Conferences per Pacoid, Episode 12

Domino Data Lab

I mention this here because there was a lot of overlap between current industry data governance needs and what the scientific community is working toward for scholarly infrastructure. The gist is, leveraging metadata about research datasets, projects, publications, etc., 2018 – Global reckoning about data governance, aka “Oops!