Big Data, Broadcasting, Metadata and Statistics

Run Trino queries 2.7 times faster with Amazon EMR 6.15.0

AWS Big Data

MARCH 22, 2024

Benchmark setup In our testing, we used the 3 TB dataset stored in Amazon S3 in compressed Parquet format and metadata for databases and tables is stored in the AWS Glue Data Catalog. This benchmark uses unmodified TPC-DS data schema and table relationships. Table and column statistics were not present for any of the tables.

Metadata

Metadata Statistics Broadcasting Optimization

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Rocket-Powered Data Science

JULY 19, 2023

I recently saw an informal online survey that asked users which types of data (tabular, text, images, or “other”) are being used in their organization’s analytics applications. This was not a scientific or statistically robust survey, so the results are not necessarily reliable, but they are interesting and provocative.

Data-driven

Data-driven Enterprise Analytics Machine Learning

Improving Data Processing with Spark 3.0 & Delta Lake

Smart Data Collective

AUGUST 5, 2021

Developed at Databricks, “Delta Lake is an open-source data storage layer that runs on the existing Data Lake and is fully cooperative with Apache Spark APIs. Along with the ability to implement ACID transactions and scalable metadata handling, Delta Lakes can also unify the streaming and batch data processing”. .

Data Processing

Data Processing Metadata Broadcasting Statistics

Data Leaders Brief

Run Trino queries 2.7 times faster with Amazon EMR 6.15.0

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

Improving Data Processing with Spark 3.0 & Delta Lake

Webinars

Stay Connected