article thumbnail

Run Trino queries 2.7 times faster with Amazon EMR 6.15.0

AWS Big Data

Benchmark setup In our testing, we used the 3 TB dataset stored in Amazon S3 in compressed Parquet format and metadata for databases and tables is stored in the AWS Glue Data Catalog. When statistics aren’t available, Amazon EMR and Athena use S3 file metadata to optimize query plans. With Amazon EMR 6.10.0

article thumbnail

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

Iceberg tables maintain metadata to abstract large collections of files, providing data management features including time travel, rollback, data compaction, and full schema evolution, reducing management overhead. Snowflake writes Iceberg tables to Amazon S3 and updates metadata automatically with every transaction.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. A quick tour through that vendor list shows a mix of early-stage, growth-stage, several firms taken by private equity, even a public listing – Varonis was the first tech IPO of 2014.

article thumbnail

Design a data mesh on AWS that reflects the envisioned organization

AWS Big Data

Founded in 2014, Acast is the world’s leading independent podcast company, elevating podcast creators and podcast advertisers for the ultimate listening experience. Data as a product Treating data as a product entails three key components: the data itself, the metadata, and the associated code and infrastructure.

article thumbnail

Regeneron turns to IT to accelerate drug discovery

CIO Business Intelligence

billion company’s scientific, commercial, and manufacturing businesses since joining the company in 2014. As the company’s chief technologist, McCowan’s job is to digitize everything and help scientists make the best use of the data and metadata regardless of how it is generated. “It It is all about the data.

Data Lake 122
article thumbnail

Real-Real-World Programming with ChatGPT

O'Reilly on Data

To provide some coherence to the music, I decided to use Taylor Swift songs since her discography covers the time span of most papers that I typically read: Her main albums were released in 2006, 2008, 2010, 2012, 2014, 2017, 2019, 2020, and 2022. This choice also inspired me to call my project Swift Papers.

article thumbnail

My Journey as an Alation Engineer: Interview with Michael Ting

Alation

So I was excited to sit down recently with Michael Ting, Senior Software Engineer on the Logical Metadata Infrastructure Team. Michael is one our most senior software engineers, having joined the company in 2014. That we have one today is testament to the work of our engineers.