Data Integration, Data Lake, Data Warehouse and Interactive

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability of Amazon Q Developer , enables you to build data integration pipelines using natural language.

Data Integration

Data Integration Data Lake Data Warehouse Software

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

This integration expands the possibilities for AWS analytics and machine learning (ML) solutions, making the data warehouse accessible to a broader range of applications. These tables are then joined with tables from the Enterprise Data Lake (EDL) at runtime. options(**read_config).option("query", cast("string")).dropDuplicates())

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Customer 360 (C360) provides a complete and unified view of a customer’s interactions and behavior across all touchpoints and channels. This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. Then, you transform this data into a concise format.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

What is less frequently mentioned is that during this same time we have also seen a rapid increase of cloud services where data needs to be delivered (data lakes, lakehouses, cloud warehouses, cloud streaming systems, cloud business processes, etc.).

Enterprise

Enterprise Data Lake Data Collection Data-driven

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Data warehouses play a vital role in healthcare decision-making and serve as a repository of historical data. A healthcare data warehouse can be a single source of truth for clinical quality control systems. What is a dimensional data model? What is a dimensional data model?

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

For any modern data-driven company, having smooth data integration pipelines is crucial. These pipelines pull data from various sources, transform it, and load it into destination systems for analytics and reporting. When running properly, it provides timely and trustworthy information.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

AWS Big Data

JUNE 28, 2023

In case the data sources change, data engineers have to manually make changes in their code and deploy it again. Furthermore, the time required to build or change pipelines makes the data unfit for near-real-time use cases such as detecting fraudulent transactions, placing online ads, and tracking passenger train schedules.

Analytics

Analytics Data Warehouse Data Lake Data-driven

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. Mohit Saxena is a Senior Software Development Manager on the AWS Glue team.

Metrics

Metrics Visualization Dashboards Interactive

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

AWS Big Data

FEBRUARY 13, 2024

Monitoring data pipelines in real time is critical for catching issues early and minimizing disruptions. AWS Glue has made this more straightforward with the launch of AWS Glue job observability metrics , which provide valuable insights into your data integration pipelines built on AWS Glue.

Metrics

Metrics Dashboards Visualization Key Performance Indicator

The Data Journey: From Raw Data to Insights

Sisense

JULY 22, 2020

The trend has been towards using cloud-based applications and tools for different functions, such as Salesforce for sales, Marketo for marketing automation, and large-scale data storage like AWS or data lakes such as Amazon S3 , Hadoop and Microsoft Azure. Sisense provides instant access to your cloud data warehouses.

Slice and Dice

Slice and Dice Digital Transformation Data Warehouse Data Lake

Moving Enterprise Data From Anywhere to Any System Made Easy

CIO Business Intelligence

JULY 13, 2022

What is less frequently mentioned is that during this same time we have also seen a rapid increase of cloud services where data needs to be delivered (data lakes, lakehouses, cloud warehouses, cloud streaming systems, cloud business processes, etc.).

Enterprise

Enterprise Data Lake Data Collection Data-driven

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori integrates natively with both Amazon Redshift provisioned clusters and Amazon Redshift Serverless for easy setup of your Amazon Redshift data warehouse in the secure Satori portal. In part 2, we will explore how to set up self-service data access with Satori to data stored in Amazon Redshift.

Data Warehouse

Data Warehouse Interactive Data Architecture Data Lake

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. Brian Ross is a Senior Software Development Manager at AWS.

Data Quality

Data Quality Statistics Data Lake Visualization

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Data mapping is essential for integration, migration, and transformation of different data sets; it allows you to improve your data quality by preventing duplications and redundancies in your data fields. Data mapping helps standardize, visualize, and understand data across different systems and applications.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

TIBCO JasperSoft for BI and Reporting

BizAcuity

AUGUST 1, 2022

TIBCO Jaspersoft offers a complete BI suite that includes reporting, online analytical processing (OLAP), visual analytics , and data integration. The web-scale platform enables users to share interactive dashboards and data from a single page with individuals across the enterprise.

Reporting

Reporting OLAP Online Analytical Processing Dashboards

5 Trends in Financial Services That Will Change How You Think about Your Data

Data Virtualization

MARCH 23, 2023

Innovative new technologies are redefining the sector, shaping the services that financial organizations offer, the ways in which they interact with consumers, and the ways in which they apply.

Interactive

Interactive Data Integration Technology Management

5 Trends in Financial Services That Will Change How You Think about Your Data

Data Virtualization

MARCH 23, 2023

Innovative new technologies are redefining the sector, shaping the services that financial organizations offer, the ways in which they interact with consumers, and the ways in which they apply.

Interactive

Interactive Data Integration Technology Management

Dimensional modeling in Amazon Redshift

AWS Big Data

JULY 19, 2023

Amazon Redshift is a fully managed and petabyte-scale cloud data warehouse that is used by tens of thousands of customers to process exabytes of data every day to power their analytics workload. You can structure your data, measure business processes, and get valuable insights quickly can be done by using a dimensional model.

Modeling

Modeling Sales Data Warehouse Snapshot

Cross-Functional Trade Surveillance

Cloudera

MAY 16, 2018

However, in this case, that output is ingested into a data lake. Instead of each group’s tools acting on the output in isolation, they leverage a common visual analytics platform that is native to the lake and uses all of the data without moving it to a separate server. Going Forward: Improved Economics.

Data Lake

Data Lake Risk Visualization Unstructured Data

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Graphs reconcile such data continuously crawled from diverse sources to support interactive queries and provide a graphic representation or model of the elements within supply chain, aiding in pathfinding and the ability to semantically enrich complex machine learning (ML) algorithms and decision making.

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

The longer answer is that in the context of machine learning use cases, strong assumptions about data integrity lead to brittle solutions overall. Most of the data management moved to back-end servers, e.g., databases. So we had three tiers providing a separation of concerns: presentation, logic, data.

Data Governance

Data Governance Machine Learning Metadata Big Data

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Architecture for data democratization Data democratization requires a move away from traditional “data at rest” architecture, which is meant for storing static data. Traditionally, data was seen as information to be put on reserve, only called upon during customer interactions or executing a program.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

How Data Governance Supports Analytics

Alation

JANUARY 27, 2022

Creating a single view of any data, however, requires the integration of data from disparate sources. Data integration is valuable for businesses of all sizes due to the many benefits of analyzing data from different sources. But data integration is not trivial. Establishes Trust in Data.

Data Governance

Data Governance Analytics Cost-Benefit Data-driven

Data Leaders Brief

Load data incrementally from transactional data lakes to data warehouses

Introducing Amazon Q data integration in AWS Glue

Webinars

Trending Sources

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Webinars

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

What is a Data Pipeline?

Your guide to AWS Analytics at AWS re:Invent 2023

Data governance in the age of generative AI

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Create an end-to-end data strategy for Customer 360 on AWS

Moving Enterprise Data From Anywhere to Any System Made Easy

A hybrid approach in healthcare data warehousing with Amazon Redshift

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

The Data Journey: From Raw Data to Insights

Moving Enterprise Data From Anywhere to Any System Made Easy

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Glue Data Quality is Generally Available

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

What is Data Mapping?

TIBCO JasperSoft for BI and Reporting

5 Trends in Financial Services That Will Change How You Think about Your Data

5 Trends in Financial Services That Will Change How You Think about Your Data

Dimensional modeling in Amazon Redshift

Cross-Functional Trade Surveillance

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Themes and Conferences per Pacoid, Episode 8

Data democratization: How data architecture can drive business decisions and AI initiatives

How Data Governance Supports Analytics

Stay Connected