Analytics, Data Integration, Data Lake and Interactive

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

With the rapid growth of technology, more and more data volume is coming in many different formats—structured, semi-structured, and unstructured. Data analytics on operational data at near-real time is becoming a common need. a new version of AWS Glue that accelerates data integration workloads in AWS.

Data Lake

Data Lake Visualization Dashboards Insurance

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

AWS Big Data

JANUARY 26, 2023

AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. AWS Glue provides an extensible architecture that enables users with different data processing use cases. as follows: # Use Glue version 3.0

Data Lake

Data Lake Big Data Software Interactive

Webinars

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

AWS Big Data

JUNE 28, 2023

In case the data sources change, data engineers have to manually make changes in their code and deploy it again. Furthermore, the time required to build or change pipelines makes the data unfit for near-real-time use cases such as detecting fraudulent transactions, placing online ads, and tracking passenger train schedules.

Analytics

Analytics Data Warehouse Data Lake Data-driven

Building Best-in-Class Enterprise Analytics

Speaker: Anthony Roach, Director of Product Management at Tableau Software, and Jeremiah Morrow, Partner Solution Marketing Director at Dremio

Tableau works with Strategic Partners like Dremio to build data integrations that bring the two technologies together, creating a seamless and efficient customer experience. As a result of a strategic partnership, Tableau and Dremio have built a native integration that goes well beyond a traditional connector.

Analytics

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

AWS Big Data

NOVEMBER 10, 2023

Amazon Redshift offers seamless integration with Apache Spark, allowing you to easily access your Redshift data on both Amazon Redshift provisioned clusters and Amazon Redshift Serverless. These tables are then joined with tables from the Enterprise Data Lake (EDL) at runtime. options(**read_config).option("query",

Data Processing

Data Processing Data Lake Data Warehouse Optimization

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created.

Enterprise

Enterprise Data Lake Data Collection Data-driven

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability of Amazon Q Developer , enables you to build data integration pipelines using natural language.

Data Integration

Data Integration Data Lake Data Warehouse Software

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. The sample dashboard showed metrics over time, top errors, and comparative job analytics.

Metrics

Metrics Visualization Dashboards Interactive

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

AWS Big Data

NOVEMBER 20, 2023

For any modern data-driven company, having smooth data integration pipelines is crucial. These pipelines pull data from various sources, transform it, and load it into destination systems for analytics and reporting. When running properly, it provides timely and trustworthy information.

Metrics

Metrics Data Lake Cost-Benefit Dashboards

Machine Learning and AI Underpin Predictive Analytics to Achieve Clinical Breakthroughs

Cloudera

JULY 18, 2018

Despite advances made in EHRs of late, they, unfortunately, do not provide advanced analytics or intelligent search for that matter. Providers trying to gain insight are challenged because up to 80% of all healthcare data is unstructured and combines EHR, clinical notes, genomics data, and more.

Machine Learning

Machine Learning Predictive Analytics Analytics Prescriptive Analytics

Moving Enterprise Data From Anywhere to Any System Made Easy

CIO Business Intelligence

JULY 13, 2022

Over the last decade, we have often heard about the proliferation of data creating sources (mobile applications, laptops, sensors, enterprise apps) in heterogeneous environments (cloud, on-prem, edge) resulting in the exponential growth of data being created.

Enterprise

Enterprise Data Lake Data Collection Data-driven

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. Brian Ross is a Senior Software Development Manager at AWS.

Data Quality

Data Quality Statistics Data Lake Visualization

Automate the archive and purge data process for Amazon RDS for PostgreSQL using pg_partman, Amazon S3, and AWS Glue

AWS Big Data

AUGUST 22, 2023

This post proposes an automated solution by using AWS Glue for automating the PostgreSQL data archiving and restoration process, thereby streamlining the entire procedure. Vivek Shrivastava is a Principal Data Architect, Data Lake in AWS Professional Services. He is a big data enthusiast and holds 14 AWS Certifications.

Data Processing

Data Processing Testing Data Lake Data Integration

5 Trends in Financial Services That Will Change How You Think about Your Data

Data Virtualization

MARCH 23, 2023

Innovative new technologies are redefining the sector, shaping the services that financial organizations offer, the ways in which they interact with consumers, and the ways in which they apply.

Interactive

Interactive Data Integration Technology Management

5 Trends in Financial Services That Will Change How You Think about Your Data

Data Virtualization

MARCH 23, 2023

Innovative new technologies are redefining the sector, shaping the services that financial organizations offer, the ways in which they interact with consumers, and the ways in which they apply.

Interactive

Interactive Data Integration Technology Management

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

AWS Big Data

JUNE 26, 2023

Companies are faced with the daunting task of ingesting all this data, cleansing it, and using it to provide outstanding customer experience. Typically, companies ingest data from multiple sources into their data lake to derive valuable insights from the data. Jupyter notebooks are web-based interactive platforms.

Insurance

Insurance Visualization Data Lake Metrics

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can use it for analytics, ML, and application development. Then we chose Amazon Athena as our query service.

Optimization

Optimization Forecasting Data Lake Metadata

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

Hundreds of thousands of customers use AWS Glue , a serverless data integration service, to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue for Apache Spark jobs work with your code and configuration of the number of data processing units (DPU).

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Gartner predicts that graph technologies will be used in 80% of data and analytics innovations by 2025, up from 10% in 2021. Graphs boost knowledge discovery and efficient data-driven analytics to understand a company’s relationship with customers and personalize marketing, products, and services.

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Customer 360 (C360) provides a complete and unified view of a customer’s interactions and behavior across all touchpoints and channels. This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

The longer answer is that in the context of machine learning use cases, strong assumptions about data integrity lead to brittle solutions overall. Data coming from machines tends to land (aka, data at rest ) in durable stores such as Amazon S3, then gets consumed by Hadoop, Spark, etc. Newer work in machine learning (e.g.,

Data Governance

Data Governance Machine Learning Metadata Big Data

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Loading complex multi-point datasets into a dimensional model, identifying issues, and validating data integrity of the aggregated and merged data points are the biggest challenges that clinical quality management systems face. It is a data modeling methodology designed for large-scale data warehouse platforms.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

Nexthink scales to trillions of events per day with Amazon MSK

AWS Big Data

MARCH 29, 2024

The Nexthink Infinity platform combines analytics, monitoring, automation, and more to manage the employee digital experience. By collecting device and application events, processing them in real time, and storing them, our platform analyzes data to solve problems and boost experiences for over 15 million employees across five continents.

Cost-Benefit

Cost-Benefit Data-driven Metrics Management

Dimensional modeling in Amazon Redshift

AWS Big Data

JULY 19, 2023

Amazon Redshift is a fully managed and petabyte-scale cloud data warehouse that is used by tens of thousands of customers to process exabytes of data every day to power their analytics workload. You can structure your data, measure business processes, and get valuable insights quickly can be done by using a dimensional model.

Modeling

Modeling Sales Data Warehouse Snapshot

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. For Report path prefix , enter cur-data/account-cur-daily.

Reporting

Reporting Data Lake Management Optimization

The Data Journey: From Raw Data to Insights

Sisense

JULY 22, 2020

At Sisense, we’re dedicated to making this complex task simple, putting power in the hands of the builders of business data and strategy, and providing insights for everyone. The launch of the Google Sheets analytics template illustrates this. Understanding how data becomes insights. Connect tables.

Slice and Dice

Slice and Dice Digital Transformation Data Warehouse Data Lake

TIBCO JasperSoft for BI and Reporting

BizAcuity

AUGUST 1, 2022

TIBCO Jaspersoft offers a complete BI suite that includes reporting, online analytical processing (OLAP), visual analytics , and data integration. The web-scale platform enables users to share interactive dashboards and data from a single page with individuals across the enterprise. Data Security.

Reporting

Reporting OLAP Online Analytical Processing Dashboards

Cross-Functional Trade Surveillance

Cloudera

MAY 16, 2018

All three cases require a “big picture” approach that incorporates new and alternative data sources and cross-functional collaboration throughout the organization not only to identify illegal activities, rogue traders, or personal misconduct but also to provide evidential material that demonstrates a deep understanding of the intent.

Data Lake

Data Lake Risk Visualization Unstructured Data

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

Organizations across the world are increasingly relying on streaming data, and there is a growing need for real-time data analytics, considering the growing velocity and volume of data being collected. The following diagram illustrates our solution architecture. Subramanya Vajiraya is a Sr.

Management

Management Metadata Testing Internet of Things

Improving Multi-tenancy with Virtual Private Clusters

Cloudera

JUNE 6, 2019

When a mix of batch, interactive, and data serving workloads are added to the mix, the problem becomes nearly intractable. While this approach provides isolation, it creates another significant challenge: duplication of data, metadata, and security policies, or ‘split-brain’ data lake. Cloudera Manager (CM) 6.2

Metadata

Metadata Data Lake Strategy Optimization

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

Join the AWS Analytics team at AWS re:Invent this year, where new ideas and exciting innovations come together. For those in the data world, this post provides a curated guide for all analytics sessions that you can use to quickly schedule and build your itinerary. A shapeshifting guardian and protector of data like Data Lynx?

Analytics

Analytics Data Lake Data Warehouse Data-driven

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

It enables data engineers, data scientists, and analytics engineers to define the business logic with SQL select statements and eliminates the need to write boilerplate data manipulation language (DML) and data definition language (DDL) expressions.

Data Lake

Data Lake Management Metrics Data Warehouse

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

Third, AWS continues adding support for more data sources including connections to software as a service (SaaS) applications, on-premises applications, and other clouds so organizations can act on their data. We are thrilled to announce that this zero-ETL integration is now generally available.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Accelerate Amazon Redshift secure data use with Satori – Part 1

AWS Big Data

SEPTEMBER 21, 2023

Satori accelerates implementing data security controls on datawarehouses like Amazon Redshift, is straightforward to integrate, and doesn’t require any changes to your Amazon Redshift data, schema, or how your users interact with data. To learn more, start a free trial or request a demo meeting.

Data Warehouse

Data Warehouse Interactive Data Architecture Data Lake

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Data and Metadata: Data inputs and data outputs produced based on the application logic.

Metadata

Metadata Cost-Benefit Enterprise Interactive

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

erwin

AUGUST 31, 2023

Reduced Data Redundancy : By eliminating data duplication, it optimizes storage and enhances data quality, reducing errors and discrepancies. Efficient Development : Accurate data models expedite database development, leading to efficient data integration, migration, and application development.

Data-driven

Data-driven Modeling Enterprise Structured Data

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Big Data Hub

AUGUST 4, 2023

Architecture for data democratization Data democratization requires a move away from traditional “data at rest” architecture, which is meant for storing static data. Traditionally, data was seen as information to be put on reserve, only called upon during customer interactions or executing a program.

Data Architecture

Data Architecture Data Lake Machine Learning Data Governance

How Data Governance Supports Analytics

Alation

JANUARY 27, 2022

How do businesses transform raw data into competitive insights? Data analytics. Modern businesses are increasingly leveraging analytics for a range of use cases. Analytics can help a business improve customer relationships, optimize advertising campaigns, develop new products, and much more. What is Data Analytics?

Data Governance

Data Governance Analytics Cost-Benefit Data-driven

Process price transparency data using AWS Glue

AWS Big Data

MAY 4, 2023

AWS Glue is a serverless data integration service that makes it straightforward to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. By repartition data we will avoid writing too many small files. write.format('parquet').mode("overwrite").partitionBy('billing_code').bucketBy(2,

Insurance

Insurance Publishing Cost-Benefit Data Lake

What is a Data Pipeline?

Jet Global

MAY 9, 2024

A data pipeline is a series of processes that move raw data from one or more sources to one or more destinations, often transforming and processing the data along the way. Data pipelines support data science and business intelligence projects by providing data engineers with high-quality, consistent, and easily accessible data.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

Data mapping is essential for integration, migration, and transformation of different data sets; it allows you to improve your data quality by preventing duplications and redundancies in your data fields. Data mapping helps standardize, visualize, and understand data across different systems and applications.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

Webinars

Trending Sources

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

Webinars

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

Building Best-in-Class Enterprise Analytics

Data governance in the age of generative AI

Simplifying data processing at Capitec with Amazon Redshift integration for Apache Spark

Moving Enterprise Data From Anywhere to Any System Made Easy

Introducing Amazon Q data integration in AWS Glue

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics

Machine Learning and AI Underpin Predictive Analytics to Achieve Clinical Breakthroughs

Moving Enterprise Data From Anywhere to Any System Made Easy

AWS Glue Data Quality is Generally Available

Automate the archive and purge data process for Amazon RDS for PostgreSQL using pg_partman, Amazon S3, and AWS Glue

5 Trends in Financial Services That Will Change How You Think about Your Data

5 Trends in Financial Services That Will Change How You Think about Your Data

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Create an end-to-end data strategy for Customer 360 on AWS

Themes and Conferences per Pacoid, Episode 8

A hybrid approach in healthcare data warehousing with Amazon Redshift

Nexthink scales to trillions of events per day with Amazon MSK

Dimensional modeling in Amazon Redshift

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

The Data Journey: From Raw Data to Insights

TIBCO JasperSoft for BI and Reporting

Cross-Functional Trade Surveillance

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Improving Multi-tenancy with Virtual Private Clusters

Your guide to AWS Analytics at AWS re:Invent 2023

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Connect your data for faster decisions with AWS

Accelerate Amazon Redshift secure data use with Satori – Part 1

How Cloudera Data Flow Enables Successful Data Mesh Architectures

The Enduring Significance of Data Modeling in the Modern Data-Driven Enterprise

Data democratization: How data architecture can drive business decisions and AI initiatives

How Data Governance Supports Analytics

Process price transparency data using AWS Glue

What is a Data Pipeline?

What is Data Mapping?

Stay Connected