AWS Big Data Blog

Connect your data for faster decisions with AWS

The most impactful data-driven insights come from connecting the dots between all your data sources—across departments, services, on-premises tools, and third-party applications. But typically, connecting data requires complex extract, transform, and load (ETL) pipelines, taking hours or days. That’s too slow for decision-making speed. ETL needs to be easier and sometimes eliminated.

AWS is investing in addressing this in several ways. First, for common use cases where ETL is repeated with little value-add, we’re integrating services to decrease or eliminate the need for ETL. Second, organizations still need transformations like cleansing, deduplication, and combining datasets for analysis and machine learning (ML). For these, AWS Glue provides fast, scalable data transformation. Third, AWS continues adding support for more data sources including connections to software as a service (SaaS) applications, on-premises applications, and other clouds so organizations can act on their data.

In this post, we discuss how we’re delivering on these investments with a number of data integration innovations that span AWS databases, analytics, business intelligence (BI), and ML services.

Amazon Aurora MySQL zero-ETL integration with Amazon Redshift is now generally available

In June 2023, we announced the public preview of Amazon Aurora MySQL-Compatible Edition zero-ETL integration with Amazon Redshift. We are thrilled to announce that this zero-ETL integration is now generally available. Amazon Aurora MySQL zero-ETL integration with Amazon Redshift processes over 1 million transactions per minute, enabling near-real-time analytics. Within seconds of new data coming into Amazon Aurora MySQL, the data is replicated to Amazon Redshift. Updates in Amazon Aurora MySQL are automatically and continuously propagated to Amazon Redshift. Customers and partners can derive tremendous time savings by reducing traditional ETL bottlenecks. They can now analyze business metrics in near-real time and make data-driven decisions faster than ever before.

In the retail industry, for example, Infosys wanted to gain faster insights about their business, such as best-selling products and high-revenue stores, based on transactions in a store management system. They used Amazon Aurora MySQL zero-ETL integration with Amazon Redshift to achieve this. With this integration, Infosys replicated Aurora data to Amazon Redshift and created Amazon QuickSight dashboards for product managers and channel leaders in just a few seconds, instead of several hours. Now, as part of Infosys Cobalt and Infosys Topaz blueprints, enterprises can have near real-time analytics on transactional data, which can help them make informed decisions related to store management.

– Sunil Senan, SVP and Global Head of Data, Analytics, and AI, Infosys

Amazon SageMaker Canvas integration with Amazon QuickSight

We are empowering business analysts to create predictive, interactive dashboards by connecting Amazon SageMaker Canvas, our no-code ML service, with Amazon QuickSight, our BI service. Business analysts use SageMaker Canvas to build ML models and generate predictions without needing to write code. They can then seamlessly integrate these predictions in QuickSight to create interactive dashboards that can be shared across their organization. This enables democratization of predictive insights for better decision-making.

Moreover, we have enabled deep, bidirectional integration between SageMaker Canvas and QuickSight. Business analysts can send ML models from SageMaker Canvas to QuickSight and run predictions from within QuickSight. Analysts can now also directly send data from QuickSight to SageMaker Canvas with just a few clicks to rapidly build ML models using a simple point-and-click interface, without needing to create or maintain complex data pipelines between the two services. This integration empowers users to go from data to predictions and visualizations faster than ever.

Connecting to SaaS applications

AWS services already connect to hundreds of AWS and third-party data sources. Data engineers can use services such as Amazon AppFlow and AWS Glue to make data quickly accessible from diverse sources. This enables organizations to derive unified insights across siloed datasets. We recently added new Amazon AppFlow and AWS Glue integrations to our existing portfolio.

Amazon AppFlow now supports concurrent processing for data transfers from SAP applications

Amazon AppFlow, a fully managed integration service that helps you securely transfer data between AWS services and SaaS applications, now supports concurrent processing and configurable page sizes for faster data transfers from SAP. This reduces the time taken to move SAP data into AWS data and artificial intelligence (AI) services.

Google BigQuery connectivity for AWS Glue for Apache Spark is now generally available

AWS Glue for Apache Spark has added native connectivity to Google BigQuery, enabling reading and writing of BigQuery data directly without the need to install or manage libraries. You can now add BigQuery as a source or target in AWS Glue Studio’s visual interface or directly in an AWS Glue ETL script.

Summary

The data integration innovations we have highlighted show our commitment to empowering organizations to easily connect their data. Whether it’s achieving near-real-time insights, democratizing predictive analytics, or connecting diverse data sources, we are focused on helping you derive more value from your data. With the new capabilities of Amazon Aurora MySQL, Amazon Redshift, SageMaker Canvas, QuickSight, Amazon AppFlow, and AWS Glue, data engineers and business analysts can break down data silos to uncover insights.

Visit Data integration with AWS to learn more.


About the authors

Rahul Pathak is VP of Relational Database Engines, leading Amazon Aurora, Amazon Redshift, and Amazon QLDB. Prior to his current role, he was VP of Analytics at AWS, where he worked across the entire AWS database portfolio. He has co-founded two companies, one focused on digital media analytics and the other on IP-geolocation.

G2 Krishnamoorthy is VP of Analytics, leading AWS data lake services, data integration, Amazon OpenSearch Service, and Amazon QuickSight. Prior to his current role, G2 built and ran the Analytics and ML Platform at Facebook/Meta, and built various parts of the SQL Server database, Azure Analytics, and Azure ML at Microsoft.