Data Integration, Data Lake and Management

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. We will use AWS Region us-east-1.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

It’s a much more seamless process for customers than having to purchase a third-party reverse ETL tool or manage some sort of pipeline back into Salesforce.” For instance, a Data Cloud-triggered flow could update an account manager in Slack when shipments in an external data lake are marked as delayed.

Data Integration

Data Integration Data Lake Metadata Data Warehouse

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

MORE WEBINARS

Load data incrementally from transactional data lakes to data warehouses

AWS Big Data

OCTOBER 19, 2023

Data lakes and data warehouses are two of the most important data storage and management technologies in a modern data architecture. Data lakes store all of an organization’s data, regardless of its format or structure.

Data Lake

Data Lake Data Warehouse Visualization Snapshot

Introducing Precisely for Data Integrity

David Menninger's Analyst Perspectives

JANUARY 25, 2021

Data is becoming more valuable and more important to organizations. At the same time, organizations have become more disciplined about the data on which they rely to ensure it is robust, accurate and governed properly.

Data Integration

Data Integration Data Processing Data Lake IT

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Talend Data Fabric Simplifies Data Life Cycle Management

David Menninger's Analyst Perspectives

NOVEMBER 16, 2021

Talend is a data integration and management software company that offers applications for cloud computing, big data integration, application integration, data quality and master data management.

Management

Management Data Warehouse Data Quality Data Integration

The Key Components of a Successful Data Lake Strategy

Data Virtualization

MARCH 16, 2023

Reading Time: 6 minutes Data lake, by combining the flexibility of object storage with the scalability and agility of cloud platforms, are becoming an increasingly popular choice as an enterprise data repository. Whether you are on Amazon Web Services (AWS) and leverage AWS S3.

Data Lake

Data Lake Strategy Data Integration Enterprise

The Key Components of a Successful Data Lake Strategy

Data Virtualization

MARCH 16, 2023

Reading Time: 6 minutes Data lake, by combining the flexibility of object storage with the scalability and agility of cloud platforms, are becoming an increasingly popular choice as an enterprise data repository. Whether you are on Amazon Web Services (AWS) and leverage AWS S3.

Data Lake

Data Lake Strategy Data Integration Enterprise

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

Is Data Virtualization the Secret Behind Operationalizing Data Lakes?

Data Virtualization

NOVEMBER 3, 2022

In attempts to overcome their big data challenges, organizations are exploring data lakes as repositories where huge volumes and varieties of. The post Is Data Virtualization the Secret Behind Operationalizing Data Lakes?

Data Lake

Data Lake Big Data Data Integration Management

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Data Virtualization

APRIL 21, 2022

Reading Time: 3 minutes First we had data warehouses, then came data lakes, and now the new kid on the block is the data lakehouse. But what is a data lakehouse and why should we develop one? In a way, the name describes what.

Data Lake

Data Lake Data Warehouse Data Integration Management

Modern Data Architecture: Data Warehousing, Data Lakes, and Data Mesh Explained

Data Virtualization

OCTOBER 5, 2022

For this reason, organizations must periodically revisit their data architectures, to ensure that they are aligned with current business goals.

Data Lake

Data Lake Data Architecture Data Integration Management

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

AWS Big Data

AUGUST 3, 2023

Data analytics on operational data at near-real time is becoming a common need. Due to the exponential growth of data volume, it has become common practice to replace read replicas with data lakes to have better scalability and performance.

Data Lake

Data Lake Visualization Dashboards Insurance

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Data Management Predictions for 2024: Five Trends

Data Virtualization

MARCH 7, 2024

Reading Time: 3 minutes As we move deeper into 2024, it is imperative for data management leaders to look in their rear-view mirrors to assess and, if needed, refine their data management strategies. One thing is clear; if data-centric organizations want to succeed in.

Management

Management Data Integration Strategy Data Lake

Data Management on Display at Informatica World 2019

David Menninger's Analyst Perspectives

JUNE 12, 2019

Under that focus, Informatica's conference emphasized capabilities across six areas (all strong areas for Informatica): data integration, data management, data quality & governance, Master Data Management (MDM), data cataloging, and data security.

Management

Management Data Quality Data Integration Data Lake

Data Management Predictions for 2024: Five Trends

Data Virtualization

JANUARY 25, 2024

Reading Time: 3 minutes As we head into 2024, it is imperative for data management leaders to look in their rear-view mirrors to assess and, if needed, refine their data management strategies.

Management

Management Data Integration Strategy Data Lake

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

We have seen a strong customer demand to expand its scope to cloud-based data lakes because data lakes are increasingly the enterprise solution for large-scale data initiatives due to their power and capabilities. Let’s say that this company is located in Europe and the data product must comply with the GDPR.

Data Lake

Data Lake Management Metrics Data Warehouse

Data Management Requirements for the Enterprise Data Lake

In(tegrate) the Clouds

MAY 1, 2016

SnapLogic published Eight Data Management Requirements for the Enterprise Data Lake. They are: Storage and Data Formats. The company also recently hosted a webinar on Democratizing the Data Lake with Constellation Research and published 2 whitepapers from Mark Madsen. Ingest and Delivery.

Data Lake

Data Lake Enterprise Management Metadata

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Data ecosystems have become jungles and in spite of all the technology, data teams are struggling to create a modern data experience. Data Lakes, Data Catalogs, and Findability Organizations approach data lakes as cheap storage. This results in a huge findability challenge.

Metadata

Metadata Data Lake Data Warehouse Data Quality

The Data Lakehouse Myth

Data Virtualization

FEBRUARY 22, 2023

Reading Time: 2 minutes The data lakehouse attempts to combine the best parts of the data warehouse with the best parts of data lakes while avoiding all of the problems inherent in both. However, the data lakehouse is not the last word in data.

Data Lake

Data Lake Data Warehouse Data Integration Management

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

AWS Big Data

OCTOBER 20, 2023

Today, we are pleased to announce new AWS Glue connectors for Azure Blob Storage and Azure Data Lake Storage that allow you to move data bi-directionally between Azure Blob Storage, Azure Data Lake Storage, and Amazon Simple Storage Service (Amazon S3). option("header","true").load("wasbs://yourblob@youraccountname.blob.core.windows.net/loadingtest-input/100mb")

Data Lake

Data Lake Big Data Consulting Data Warehouse

The Data Lakehouse Myth

Data Virtualization

FEBRUARY 22, 2023

Reading Time: 2 minutes The data lakehouse attempts to combine the best parts of the data warehouse with the best parts of data lakes while avoiding all of the problems inherent in both. However, the data lakehouse is not the last word in data.

Data Lake

Data Lake Data Warehouse Data Integration Management

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

AWS Big Data

JANUARY 26, 2023

AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. AWS Glue provides an extensible architecture that enables users with different data processing use cases.

Data Lake

Data Lake Big Data Software Interactive

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

AWS Big Data

NOVEMBER 13, 2023

Amazon Redshift is a fully managed data warehousing service that offers both provisioned and serverless options, making it more efficient to run and scale analytics without having to manage your data warehouse. These upstream data sources constitute the data producer components.

Data Warehouse

Data Warehouse Data Lake Analytics Data Science

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

AWS Big Data

SEPTEMBER 22, 2023

At Stitch Fix, we have been powered by data science since its foundation and rely on many modern data lake and data processing technologies. In our infrastructure, Apache Kafka has emerged as a powerful tool for managing event streams and facilitating real-time data processing.

Management

Management Metrics Cost-Benefit Data Lake

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

AWS Big Data

JANUARY 12, 2024

There are many reasons for customers to migrate to AWS, but one of the main reasons is the ability to use fully managed services rather than spending time maintaining infrastructure, patching, monitoring, backups, and more. Amazon AppFlow can be used to transfer data from different SaaS applications to a data lake.

Data Lake

Data Lake Cost-Benefit Visualization Structured Data

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance. Choose Save. Choose Register location.

Data Lake

Data Lake Snapshot Metadata Optimization

Data Management Challenges Solved – The Denodo Platform on Alibaba Cloud, Coming to a Data Center Near You

Data Virtualization

JUNE 29, 2023

However, the pain is real when it comes to data integration and data management, but today’s enterprise architects are racing to build modern data infrastructures using data fabric, The post Data Management Challenges Solved – The Denodo Platform on Alibaba Cloud, Coming to a Data Center Near You appeared first on Data Management Blog - Data (..)

Management

Management Data Integration Enterprise Data Lake

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

When internal resources fall short, companies outsource data engineering and analytics. There’s no shortage of consultants who will promise to manage the end-to-end lifecycle of data from integration to transformation to visualization. . The challenge is that data engineering and analytics are incredibly complex.

Consulting

Consulting Testing Data Lake Data Quality

Modernizing Data Analytics Architecture with the Denodo Platform on Azure

Data Virtualization

JANUARY 19, 2023

Reading Time: 2 minutes Today, many businesses are modernizing their on-premises data warehouses or cloud-based data lakes using Microsoft Azure Synapse Analytics. Unfortunately, with data spread.

Data Analytics

Data Analytics Data Lake Data Warehouse Analytics

Your guide to AWS Analytics at AWS re:Invent 2023

AWS Big Data

NOVEMBER 13, 2023

11:30 AM – 12:30 PM (PDT) Ceasars Forum ANT318 | Accelerate innovation with end-to-end serverless data architecture. 4:30 PM – 5:30 PM (PDT) Wynn ANT207 | Understand your data with business context. 1:00 PM – 2:00 PM (PDT) Venetian ANT201 | Accelerate innovation with real-time data.

Analytics

Analytics Data Lake Data Warehouse Data-driven

Data Fabric Approach for Effective Data Management

Data Virtualization

MARCH 31, 2022

Reading Time: 4 minutes A discussion on All Things Data with Katrina Briedis, Senior Product Marketing Manager (APAC) at Denodo, with a special focus on Data Fabric Approach for Effective Data Management. Listen to “Is Data Fabric the Ideal Approach for Effective Data Management?”

Management

Management Data Integration Marketing Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

AWS Big Data

FEBRUARY 13, 2024

Monitoring data pipelines in real time is critical for catching issues early and minimizing disruptions. AWS Glue has made this more straightforward with the launch of AWS Glue job observability metrics , which provide valuable insights into your data integration pipelines built on AWS Glue. Choose Add new data source.

Metrics

Metrics Dashboards Visualization Key Performance Indicator

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

AWS Big Data

DECEMBER 21, 2023

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. AWS Glue provides both visual and code-based interfaces to make data integration effortless. Attach the AWS managed policy GlueServiceRole.

Analytics

Analytics IT Data Lake Visualization

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

AWS Big Data

DECEMBER 15, 2023

Even after identification, it’s cumbersome to implement redaction, masking, or encryption of sensitive data at scale. In this post, we provide an automated solution to detect PII data in Amazon Redshift using AWS Glue. For our solution, we use Amazon Redshift to store the data.

Data Lake

Data Lake Data Warehouse Big Data Structured Data

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

BizAcuity

NOVEMBER 22, 2022

Snowflake is a cloud-native platform that eliminates the need for separate data warehouses, data lakes, and data marts allowing secure data sharing across the organization. For this reason, Snowflake is often the cloud-native data warehouse of choice. This makes the data available sooner.

Data Warehouse

Data Warehouse Cost-Benefit Data Lake Internet of Things

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Compose your ETL jobs for MongoDB Atlas with AWS Glue

AWS Big Data

MAY 3, 2023

In today’s data-driven business environment, organizations face the challenge of efficiently preparing and transforming large amounts of data for analytics and data science purposes. Businesses need to build data warehouses and data lakes based on operational data.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

AWS Big Data

MARCH 27, 2024

They can then use the result of their analysis to understand a patient’s health status, treatment history, and past or upcoming doctor consultations to make more informed decisions, streamline the claim management process, and improve operational outcomes. The CloudFormation stack also deploys a provisioned Redshift cluster.

Data Analytics

Data Analytics Analytics Data Warehouse Data Lake

A Closer Look at The Next Phase of Cloudera’s Hybrid Data Lakehouse

Cloudera

MARCH 5, 2024

This marks a significant milestone for the platform: according to IDC, today about half of the world’s enterprise production data under management is on-prem. The platform is ready to address the complexities of managing highly sensitive, yet critical, company data while still extracting the most value from its use.

Snapshot

Snapshot Data Lake Enterprise Data Governance

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Salesforce debuts Zero Copy Partner Network to ease data integration

Webinars

Load data incrementally from transactional data lakes to data warehouses

Introducing Precisely for Data Integrity

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Talend Data Fabric Simplifies Data Life Cycle Management

The Key Components of a Successful Data Lake Strategy

The Key Components of a Successful Data Lake Strategy

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Is Data Virtualization the Secret Behind Operationalizing Data Lakes?

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Modern Data Architecture: Data Warehousing, Data Lakes, and Data Mesh Explained

Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Amazon QuickSight

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Data Management Predictions for 2024: Five Trends

Data Management on Display at Informatica World 2019

Data Management Predictions for 2024: Five Trends

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Data Management Requirements for the Enterprise Data Lake

How Knowledge Graphs Power Data Mesh and Data Fabric

The Data Lakehouse Myth

Migrate data from Azure Blob Storage to Amazon S3 using AWS Glue

The Data Lakehouse Myth

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started

How GamesKraft uses Amazon Redshift data sharing to support growing analytics workloads

Stitch Fix seamless migration: Transitioning from self-managed Kafka to Amazon MSK

Detect, mask, and redact PII data using AWS Glue before loading into Amazon OpenSearch Service

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

Introducing Apache Hudi support with AWS Glue crawlers

Data Management Challenges Solved – The Denodo Platform on Alibaba Cloud, Coming to a Data Center Near You

Fire Your Super-Smart Data Consultants with DataOps

Modernizing Data Analytics Architecture with the Denodo Platform on Azure

Your guide to AWS Analytics at AWS re:Invent 2023

Data Fabric Approach for Effective Data Management

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics: Part 2

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

Snowflake: Data Ingestion Using Snowpipe and AWS Glue

Data governance in the age of generative AI

Compose your ETL jobs for MongoDB Atlas with AWS Glue

Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics

A Closer Look at The Next Phase of Cloudera’s Hybrid Data Lakehouse

Stay Connected