Data Transformation, Metadata, Publishing and Testing

Data Transformation

Metadata

Publishing

Testing

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Allows them to iteratively develop processing logic and test with as little overhead as possible. Plays nice with existing CI/CD processes to promote a data pipeline to production. Provides monitoring, alerting, and troubleshooting for production data pipelines.

Testing

Testing Publishing Metadata Interactive

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. This allows developers to make changes to their processing logic on the fly while running some test data through their flow and validating that their changes work as intended.

Testing

Testing Cost-Benefit Interactive Visualization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

This person (or group of individuals) ensures that the theory behind data quality is communicated to the development team. 2 – Data profiling. Data profiling is an essential process in the DQM lifecycle. This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g.,

Data Quality

Data Quality Metrics Data-driven Management

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Developers can use the support in Amazon Location Service for publishing device position updates to Amazon EventBridge to build a near-real-time data pipeline that stores locations of tracked assets in Amazon Simple Storage Service (Amazon S3). Athena is used to run geospatial queries on the location data stored in the S3 buckets.

Analytics

Analytics IoT Metadata Internet of Things

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

More specifically, IDF has been integrated with Alation at an API level; this means that all generated pipeline code, metadata attributes, configuration files, and lineage are automatically synced (representing a huge time savings). They can better understand data transformations, checks, and normalization. Transparency is key.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

The following AWS services are used for data ingestion, processing, and load: Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications like Salesforce, SAP, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift , in just a few clicks.

Sales

Sales Visualization Software Marketing

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

We also share a Spark benchmark solution that suits all Amazon EMR deployment options, so you can replicate the process in your environment for your own performance test cases. The solution uses the TPC-DS dataset and unmodified data schema and table relationships, but derives queries from TPC-DS to support the SparkSQL test cases.

Testing

Testing Big Data Metadata Optimization

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

For these workloads, data lake vendors usually recommend extracting data into flat files to be used solely for model training and testing purposes. This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems. Data discoverability.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

For data pipeline orchestration, the Apache Airflow UI is a user-friendly tool that provides detailed views into your data pipeline. When it comes to pipeline health management, each service that your tasks are interacting with could be storing or publishing logs to different locations, such as an S3 bucket or Amazon CloudWatch logs.

Management

Management Interactive Metadata Publishing

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

foundation models to help users discover, augment, and enrich data with natural language. Watsonx.data is built on 3 core integrated components: multiple query engines, a catalog that keeps track of metadata, and storage and relational data sources which the query engines directly access. Within the watsonx.ai

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Requirement Multi-Source Data Blending Data from multiple sources is compiled and the output is a single view, metric, or visualization. Data Transformation and Enrichment Data can be enriched for analysis. Metadata Self-service analysis is made easy with user-friendly naming conventions for tables and columns.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Data Leaders Brief

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Webinars

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Turnkey Cloud DataOps: Solution from Alation and Accenture

Cross-account integration between SaaS platforms using Amazon AppFlow

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Data platform trinity: Competitive or complementary?

Improve observability across Amazon MWAA tasks

Exploring the AI and data capabilities of watsonx

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

What is Data Mapping?

What Is Embedded Analytics?

Stay Connected