Data Transformation, Publishing and Testing

Data Transformation

Publishing

Testing

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

Allows them to iteratively develop processing logic and test with as little overhead as possible. Plays nice with existing CI/CD processes to promote a data pipeline to production. Provides monitoring, alerting, and troubleshooting for production data pipelines.

Testing

Testing Publishing Metadata Interactive

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow. This allows developers to make changes to their processing logic on the fly while running some test data through their flow and validating that their changes work as intended.

Testing

Testing Cost-Benefit Interactive Visualization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Improve Business Agility by Hiring a DataOps Engineer

DataKitchen

DECEMBER 20, 2020

They give data scientists tools to instantiate development sandboxes on demand. They automate the data operations pipeline and create platforms used to test and monitor data from ingestion to published charts and graphs.

Data-driven

Data-driven Manufacturing Data Architecture Data Analytics

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

Jen Stirrup

SEPTEMBER 30, 2021

For example, data can be filtered so that the investigation can be focused more specifically. There are a number of Data Transformation modules which help with these area. That said, it’s often better to clean the data further upstream so it is done closer to the source rather than at the end of a spoke.

Business Intelligence

Business Intelligence Data mining Machine Learning Testing

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle for your data integration pipelines. The following are common asks from our customers: Is it possible to develop and test AWS Glue data integration jobs on my local laptop?

Data Integration

Data Integration Snapshot Testing Visualization

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

You simply configure your data sources to send information to OpenSearch Ingestion, which then automatically delivers the data to your specified destination. Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery. Choose the Test tab. For Method type ¸ choose POST.

Dashboards

Dashboards Testing Metrics Optimization

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Also known as data validation, integrity refers to the structural testing of data to ensure that the data complies with procedures. This means there are no unintended data errors, and it corresponds to its appropriate designation (e.g., Here, it all comes down to the data transformation error rate.

Data Quality

Data Quality Metrics Data-driven Management

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. . Secondly, instead of being tied to the embedded Airflow within CDE, we wanted any customer using Airflow (even outside of CDE) to tap into the CDP platform, that’s why we published our Cloudera provider package.

Snapshot

Snapshot Data-driven Optimization Management

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Cloudera

JULY 21, 2022

Cloudera Data Warehouse). Efficient batch data processing. Complex data transformations. Triton Digital, for example, uses Rill to deploy self-serve reporting for hundreds of digital media publishers with little or no training. Apache Hive. Large-scale high throughput analytics. Joins and subqueries . Apache Druid.

Metrics

Metrics Slice and Dice Data Warehouse Dashboards

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

Developers can use the support in Amazon Location Service for publishing device position updates to Amazon EventBridge to build a near-real-time data pipeline that stores locations of tracked assets in Amazon Simple Storage Service (Amazon S3). You can test this solution yourself using the AWS Samples GitHub repository.

Analytics

Analytics IoT Metadata Internet of Things

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

This produces end-to-end lineage so business and technology users alike can understand the state of a data lake and/or lake house. They can better understand data transformations, checks, and normalization. They can better grasp the purpose and use for specific data (and improve the pipeline!). Transparency is key.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

Assessing and interviewing data engineers from a distance

Insight

APRIL 8, 2020

For example, they may give applicants access to an API and ask them to query data that satisfies some criteria, or they may share a large dataset and asking applicants to perform some sort of data transformation. Each submission is run through a series of tests to ensure that the desired output is produced.

Cost-Benefit

Cost-Benefit Software Data Warehouse Optimization

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

The following AWS services are used for data ingestion, processing, and load: Amazon AppFlow is a fully managed integration service that enables you to securely transfer data between SaaS applications like Salesforce, SAP, Marketo, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift , in just a few clicks.

Sales

Sales Visualization Software Marketing

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Ontotext

APRIL 4, 2019

Within a large enterprise, there is a huge amount of data accumulated over the years – many decisions have been made and different methods have been tested. Milena Yankova : What we did for the BBC in the previous Olympics was that we helped journalists publish their reports faster. I think artists can relax.

Recreation/Entertainment

Recreation/Entertainment Testing Enterprise Knowledge Discovery

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

AWS Big Data

APRIL 12, 2023

We also share a Spark benchmark solution that suits all Amazon EMR deployment options, so you can replicate the process in your environment for your own performance test cases. The solution uses the TPC-DS dataset and unmodified data schema and table relationships, but derives queries from TPC-DS to support the SparkSQL test cases.

Testing

Testing Big Data Metadata Optimization

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

AWS Big Data

JUNE 20, 2023

Kinesis Data Firehose is a fully managed service for delivering near-real-time streaming data to various destinations for storage and performing near-real-time analytics. You can perform analytics on VPC flow logs delivered from your VPC using the Kinesis Data Firehose integration with Datadog as a destination.

Dashboards

Dashboards Visualization Metrics Data Transformation

DataOps Observability: Taming the Chaos (Part 2)

DataKitchen

OCTOBER 25, 2022

The goal of DataOps Observability is to provide visibility of every journey that data takes from source to customer value across every tool, environment, data store, data and analytic team, and customer so that problems are detected, localized and raised immediately. That data then fills several database tables.

Testing

Testing Data-driven Visualization Dashboards

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

Data is decompressed and stored in a different S3 bucket (transformed data can be stored in the same S3 bucket where data was ingested, but for simplicity, we’re using two separate S3 buckets). The transformed data is then made accessible to Snowflake for data analysis. Set the protocol to Email.

Data Processing

Data Processing Management Publishing Testing

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

AWS Big Data

JUNE 29, 2023

In this post, we discuss why AWS recommends moving from Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics for Apache Flink to take advantage of Apache Flink’s advanced streaming capabilities. View the stream data. Transform and enrich the data. Manipulate the data with Python.

Data Analytics

Data Analytics Analytics IoT Data Lake

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

AWS Big Data

JULY 27, 2023

DataBrew is a visual data preparation tool that enables you to clean and normalize data without writing any code. The over 200 transformations it provides are now available to be used in an AWS Glue Studio visual job. Now that you have addressed all data quality issues identified on the sample, publish the project as a recipe.

Visualization

Visualization Cost-Benefit Data Quality Interactive

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

Tricentis is the global leader in continuous testing for DevOps, cloud, and enterprise applications. Speed changes everything, and continuous testing across the entire CI/CD lifecycle is the key. Tricentis instills that confidence by providing software tools that enable Agile Continuous Testing (ACT) at scale.

Software

Software Data Lake Testing Cost-Benefit

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Alation

OCTOBER 27, 2022

Few actors in the modern data stack have inspired the enthusiasm and fervent support as dbt. This data transformation tool enables data analysts and engineers to transform, test and document data in the cloud data warehouse. But what does this mean from a practitioner perspective?

Metrics

Metrics Dashboards Sales Reporting

Improve observability across Amazon MWAA tasks

AWS Big Data

FEBRUARY 6, 2023

For data pipeline orchestration, the Apache Airflow UI is a user-friendly tool that provides detailed views into your data pipeline. When it comes to pipeline health management, each service that your tasks are interacting with could be storing or publishing logs to different locations, such as an S3 bucket or Amazon CloudWatch logs.

Management

Management Interactive Metadata Publishing

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

For these workloads, data lake vendors usually recommend extracting data into flat files to be used solely for model training and testing purposes. This adds an additional ETL step, making the data even more stale. Data lakehouse was created to solve these problems. Data discoverability.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

Real-time analytics and BI: Combine data from existing sources with new data to unlock new, faster insights without the cost and complexity of duplicating and moving data across different environments. How you can get started today Test out watsonx.ai and watsonx.data for yourself with our watsonx trial experience.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

APRIL 26, 2021

GSK had been pursuing DataOps capabilities such as automation, containerization, automated testing and monitoring, and reusability, for several years. DataOps provides the “continuous delivery equivalent for Machine Learning and enables teams to manage the complexities around continuous training, A/B testing, and deploying without downtime.

Measurement

Measurement Metrics Data-driven Testing

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

Strategic Objective Create a complete, user-friendly view of the data by preparing it for analysis. Requirement Multi-Source Data Blending Data from multiple sources is compiled and the output is a single view, metric, or visualization. Data Transformation and Enrichment Data can be enriched for analysis.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Data Leaders Brief

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Webinars

Trending Sources

Improve Business Agility by Hiring a DataOps Engineer

Webinars

AzureML and CRISP-DM – a Framework to help the Business Intelligence professional move to AI

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Cloudera Data Engineering 2021 Year End Review

Simplify Metrics on Apache Druid With Rill Data and Cloudera

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Turnkey Cloud DataOps: Solution from Alation and Accenture

Assessing and interviewing data engineers from a distance

Cross-account integration between SaaS platforms using Amazon AppFlow

AI, the Power of Knowledge and the Future Ahead: An Interview with Head of Ontotext’s R&I Milena Yankova

Amazon EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

Stream VPC Flow Logs to Datadog via Amazon Kinesis Data Firehose

DataOps Observability: Taming the Chaos (Part 2)

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Migrate from Amazon Kinesis Data Analytics for SQL Applications to Amazon Kinesis Data Analytics Studio

Use AWS Glue DataBrew recipes in your AWS Glue Studio visual ETL jobs

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

How Alation’s Data Team Uses the Modern Data Stack to Power Insights

Improve observability across Amazon MWAA tasks

Data platform trinity: Competitive or complementary?

Exploring the AI and data capabilities of watsonx

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

What is Data Mapping?

What Is Embedded Analytics?

Stay Connected