Data Processing, Data Transformation, Reference and Testing

Data Processing

Data Transformation

Reference

Testing

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

AWS Big Data

JULY 26, 2023

To grow the power of data at scale for the long term, it’s highly recommended to design an end-to-end development lifecycle for your data integration pipelines. The following are common asks from our customers: Is it possible to develop and test AWS Glue data integration jobs on my local laptop?

Data Integration

Data Integration Snapshot Testing Visualization

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

AWS Big Data

FEBRUARY 26, 2024

Additionally, you can configure OpenSearch Ingestion to apply data transformations before delivery. The content includes a reference architecture, a step-by-step guide on infrastructure setup, sample code for implementing the solution within a use case, and an AWS Cloud Development Kit (AWS CDK) application for deployment.

Dashboards

Dashboards Testing Metrics Optimization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

datapine

SEPTEMBER 29, 2022

Reporting being part of an effective DQM, we will also go through some data quality metrics examples you can use to assess your efforts in the matter. But first, let’s define what data quality actually is. What is the definition of data quality? Why Do You Need Data Quality Management? date, month, and year).

Data Quality

Data Quality Metrics Data-driven Management

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Use Snowflake with Amazon MWAA to orchestrate data pipelines

AWS Big Data

OCTOBER 31, 2023

citibike-tripdata-destination-ACCOUNT_ID – The bucket used for storing the transformed dataset. When implementing the solution in this post, replace references to airflow-blog-bucket-ACCOUNT_ID and citibike-tripdata-destination-ACCOUNT_ID with the names of your own S3 buckets.

Data Processing

Data Processing Management Publishing Testing

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

AWS Big Data

OCTOBER 11, 2023

Traditionally, such a legacy call center analytics platform would be built on a relational database that stores data from streaming sources. Data transformations through stored procedures and use of materialized views to curate datasets and generate insights is a known pattern with relational databases.

Management

Management Metadata Analytics Dashboards

Data Integrity, the Basis for Reliable Insights

Sisense

AUGUST 28, 2020

Uncomfortable truth incoming: Most people in your organization don’t think about the quality of their data from intake to production of insights. However, as a data team member, you know how important data integrity (and a whole host of other aspects of data management) is. Means of ensuring data integrity.

Data Integration

Data Integration Testing Data Quality Data-driven

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

AWS Big Data

JULY 31, 2023

Customers often use many SQL scripts to select and transform the data in relational databases hosted either in an on-premises environment or on AWS and use custom workflows to manage their ETL. AWS Glue is a serverless data integration and ETL service with the ability to scale on demand. Choose Confirm.

Sales

Sales Data Warehouse Visualization Testing

Gain insights from historical location data using Amazon Location Service and AWS analytics services

AWS Big Data

MARCH 13, 2024

You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches. You can test this solution yourself using the AWS Samples GitHub repository. This method uses GZIP compression to optimize storage consumption and query performance.

Analytics

Analytics IoT Metadata Internet of Things

Cross-account integration between SaaS platforms using Amazon AppFlow

AWS Big Data

APRIL 25, 2023

On many occasions, they need to apply business logic to the data received from the source SaaS platform before pushing it to the target SaaS platform. AnyCompany’s marketing team hosted an event at the Anaheim Convention Center, CA. However, for quick testing purposes, we demonstrate how to manually run the flow on demand.

Sales

Sales Visualization Software Marketing

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

AWS Big Data

FEBRUARY 21, 2023

We use Apache Spark as our main data processing engine and have over 1,000 Spark applications running over massive amounts of data every day. These Spark applications implement our business logic ranging from data transformation, machine learning (ML) model inference, to operational tasks. Their costs were climbing.

Cost-Benefit

Cost-Benefit Informatics Optimization Management

Automating the Automators: Shift Change in the Robot Factory

O'Reilly on Data

JANUARY 17, 2023

” I, thankfully, learned this early in my career, at a time when I could still refer to myself as a software developer. Upload your data, click through a workflow, walk away. If you’re a professional data scientist, you already have the knowledge and skills to test these models. It does not exist in the code.

Machine Learning

Machine Learning Predictive Modeling Software Modeling

The Modern Data Stack Explained: What The Future Holds

Alation

JANUARY 17, 2023

A modern data stack relies on cloud computing, whereas a legacy data stack stores data on servers instead of in the cloud. Modern data stacks provide access for more data professionals than a legacy data stack. An example of a data science tool is Dataiku. How Can I Build a Modern Data Stack?

Data Warehouse

Data Warehouse Cost-Benefit Data Transformation Data Science

The Rising Need for Data Governance in Healthcare

Alation

OCTOBER 28, 2021

Protect data at the source. Put data into action to optimize the patient experience and adapt to changing business models. What is Data Governance in Healthcare? Data governance in healthcare refers to how data is collected and used by hospitals, pharmaceutical companies, and other healthcare organizations and service providers.

Data Governance

Data Governance Measurement Metrics Modeling

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

What Is Embedded Analytics?

Jet Global

MAY 1, 2023

that gathers data from many sources. Strategic Objective Create a complete, user-friendly view of the data by preparing it for analysis. Requirement Multi-Source Data Blending Data from multiple sources is compiled and the output is a single view, metric, or visualization. Ask your vendors for references.

Analytics

Analytics Cost-Benefit Visualization Dashboards

Data Leaders Brief

End-to-end development lifecycle for data engineers to build a data integration pipeline using AWS Glue

Enable advanced search capabilities for Amazon Keyspaces data by integrating with Amazon OpenSearch Service

Webinars

Trending Sources

The Ultimate Guide to Modern Data Quality Management (DQM) For An Effective Data Quality Control Driven by The Right Metrics

Webinars

Use Snowflake with Amazon MWAA to orchestrate data pipelines

Modernize a legacy real-time analytics application with Amazon Managed Service for Apache Flink

Data Integrity, the Basis for Reliable Insights

Migrate your existing SQL-based ETL workload to an AWS serverless ETL infrastructure using AWS Glue

Gain insights from historical location data using Amazon Location Service and AWS analytics services

Cross-account integration between SaaS platforms using Amazon AppFlow

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with Amazon EMR on Amazon EKS

Automating the Automators: Shift Change in the Robot Factory

The Modern Data Stack Explained: What The Future Holds

The Rising Need for Data Governance in Healthcare

What is Data Mapping?

What Is Embedded Analytics?

Stay Connected