2022, Data Transformation and Metadata

2022

Data Transformation

Metadata

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

Alternatively, you can use AWS Glue for Apache Spark, which provides built-in support for bucketing configurations during the data transformation process. AWS Glue allows you to define bucketing parameters, such as the number of buckets and the columns to bucket on, providing an optimized data layout for efficient querying with Athena.

Optimization

Optimization Data Lake Cost-Benefit Reporting

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Bayerische Motoren Werke AG (BMW) is a motor vehicle manufacturer headquartered in Germany with 149,475 employees worldwide and the profit before tax in the financial year 2022 was € 23.5 Data providers and consumers are the two fundamental users of a CDH dataset. The difference lies in when and where data transformation takes place.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Analytics Vidhya

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

He’s a true expert in the field, having worked at Oracle, Scient, BearingPoint, and Booz Allen Hamilton, and on data-focused projects with companies like LMVH, Major League Baseball, Toyota, American Express, Freddie Mac, and many, many others. I recently had the opportunity to connect with Mohan at Snowflake Summit 2022 in Las Vegas.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

We have been continually improving the Spark performance in each Amazon EMR release to further shorten job runtime and optimize users’ spending on their Amazon EMR big data workloads. release in January 2022, the optimized Spark runtime was 3.5 times faster than our first release of 2022, Amazon EMR 6.5. As of the Amazon EMR 6.5

Testing

Testing Big Data Metadata Optimization

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

Data Lake

Data Lake Data Architecture Metadata Data Warehouse

Why Data Lineage is Key to the LIBOR Transition

Octopai

NOVEMBER 23, 2020

As you may have heard, the existing London Interbank Offered Rate (LIBOR) will be retired on January 1st, 2022. Naturally, this will produce a massive growth in data management needs. In fact, the LIBOR transition program marks one of the largest data transformation obstacles ever seen in financial services.

Metadata

Metadata Enterprise Business Intelligence Data Governance

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Data ingestion – Steps 1 and 2 use AWS DMS, which connects to the source database and moves full and incremental data (CDC) to Amazon S3 in Parquet format. Data transformation – Steps 3 and 4 represent an EMR Serverless Spark application (Amazon EMR 6.9 Let’s refer to this S3 bucket as the raw layer.

Data Lake

Data Lake Dashboards Metrics Metadata

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

There are countless examples of big data transforming many different industries. There is no disputing the fact that the collection and analysis of massive amounts of unstructured data has been a huge breakthrough. How does Data Virtualization complement Data Warehousing and SOA Architectures?

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

Are your data users overwhelmed by silos and frustrated by untrusted data? That was the message — delivered a little more elegantly than that — at Databricks’ Data+AI Summit 2022. The Power of Partnership to Accelerate Data Transformation. Tell them to grab a catalog … and go jump in a lake.

ROI

ROI Metadata Data Lake Digital Transformation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To ingest the data, smava uses a set of popular third-party customer data platforms complemented by custom scripts. After the data lands in Amazon S3, smava uses the AWS Glue Data Catalog and crawlers to automatically catalog the available data, capture the metadata, and provide an interface that allows querying all data assets.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Data Leaders Brief

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Webinars

Trending Sources

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Webinars

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Why Data Lineage is Key to the LIBOR Transition

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Biggest Trends in Data Visualization Taking Shape in 2022

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Stay Connected