article thumbnail

Cloudera Data Engineering 2021 Year End Review

Cloudera

New in 2021. Figure 2 – CDE product launch highlights in 2021. Early on in 2021 we expanded our APIs to support pipelines using a new job type — Airflow. This enabled new use-cases with customers that were using a mix of Spark and Hive to perform data transformations. . Test Drive CDP Pubic Cloud.

Snapshot 115
article thumbnail

What is a DataOps Engineer?

DataKitchen

Too many data organizations run data operations like a hundred-year-old car factory. While car companies lowered costs using mass production, companies in 2021 put data engineers and data scientists on the assembly line. That’s the state of data analytics today. . Their product is the data.

Testing 152
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Create a modern data platform using the Data Build Tool (dbt) in the AWS Cloud

AWS Big Data

A modern data platform entails maintaining data across multiple layers, targeting diverse platform capabilities like high performance, ease of development, cost-effectiveness, and DataOps features such as CI/CD, lineage, and unit testing. It does this by helping teams handle the T in ETL (extract, transform, and load) processes.

article thumbnail

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

In 2021 we launched Cloudera DataFlow for the Public Cloud (CDF-PC) , addressing operational challenges that administrators face when running NiFi flows in production environments. Developers need to onboard new data sources, chain multiple data transformation steps together, and explore data as it travels through the flow.

Testing 95
article thumbnail

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. This solution includes a Lambda function that continuously updates the Amazon Location tracker with simulated location data from fictitious journeys.

article thumbnail

The 10 biggest issues IT faces today

CIO Business Intelligence

1 priority within the CIO function is cybersecurity strategies, up from the second spot in 2021. Angel-Johnson says she, too, has a heightened level of concern around security issues and more specifically data protection. I thought I was hired for digital transformation but what is really needed is a data transformation,” she says.

IT 144
article thumbnail

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

Alternatively, you can use AWS Glue for Apache Spark, which provides built-in support for bucketing configurations during the data transformation process. AWS Glue allows you to define bucketing parameters, such as the number of buckets and the columns to bucket on, providing an optimized data layout for efficient querying with Athena.