Data Integration, Data Lake, Management and Metadata

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

licensed, 100% open-source data table format that helps simplify data processing on large datasets stored in data lakes. Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

“The challenge that a lot of our customers have is that requires you to copy that data, store it in Salesforce; you have to create a place to store it; you have to create an object or field in which to store it; and then you have to maintain that pipeline of data synchronization and make sure that data is updated,” Carlson said.

Data Integration

Data Integration Data Lake Metadata Data Warehouse

Webinars

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift is a fast, fully managed petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Amazon Redshift also supports querying nested data with complex data types such as struct, array, and map.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Data Virtualization

APRIL 21, 2022

Reading Time: 3 minutes First we had data warehouses, then came data lakes, and now the new kid on the block is the data lakehouse. But what is a data lakehouse and why should we develop one? In a way, the name describes what.

Data Lake

Data Lake Data Warehouse Data Integration Management

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

For data lake customers who need to discover petabytes of data, AWS Glue crawlers are a popular way to discover and catalog data in the background. This allows users to search and find relevant data from multiple data sources. Choose the table to view the schema and other metadata.

Metadata

Metadata Data Lake Machine Learning Management

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Data ecosystems have become jungles and in spite of all the technology, data teams are struggling to create a modern data experience. In most enterprises data teams lack a data map and data asset inventory and are often unaware of data that exists across the organization, its associated profile, quality and associated metadata.

Metadata

Metadata Data Lake Data Warehouse Data Quality

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog.

Metadata

Metadata Data Lake Machine Learning Big Data

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

Optimization

Optimization Forecasting Data Lake Metadata

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

So if you’re going to move from your data from on-premise legacy data stores and warehouse systems to the cloud, you should do it right the first time. With all these diverse metadata sources, it is difficult to understand the complicated web they form much less get a simple visual flow of data lineage and impact analysis.

Data Governance

Data Governance Metadata Testing Data Lake

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

AWS provides different services for building data ingestion pipelines: AWS Glue is a serverless data integration service that ingests data in batches from on-premises databases and data stores in the cloud. Streaming data with low latency needs is stored in Amazon Kinesis Data Streams for real-time consumption.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Unlocking the value of data as your differentiator

AWS Big Data

NOVEMBER 29, 2023

These tools also need to be intelligent to remove the heavy lifting from data management. Comprehensive First, you need a comprehensive set of data services so you can get the price/performance, speed, flexibility, and capabilities for any use case. Customers have created hundreds of thousands of data lakes on Amazon S3.

Data Warehouse

Data Warehouse Data Lake Data Integration Dashboards

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Next generation of big data platforms and long running batch jobs operated by a central team of data engineers have often led to data lake swamps. Enhance trust in data by getting visibility into where data came from, how it has changed, and who is using it. Reduce data duplication and fragmentation.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

This would be straightforward task were it not for the fact that, during the digital-era, there has been an explosion of data – collected and stored everywhere – much of it poorly governed, ill-understood, and irrelevant. Further, data management activities don’t end once the AI model has been developed.

Data Governance

Data Governance IT Risk Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient. An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog.

Metrics

Metrics Visualization Dashboards Interactive

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Improved trust and confidence in data.

Metadata

Metadata Data Quality Data-driven Data Governance

What is an Information Steward, and Why You Should Care

Grooper

MARCH 5, 2020

If your organization has any kind of data and analytics initiative, then chances are you have people – maybe even an entire department dedicated to managing and integrating data for (and between) software applications to achieve some sort of business outcome. Haven’t heard the term “information steward” before?

Data Lake

Data Lake Metadata Data Quality Software

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Data Virtualization

MARCH 28, 2024

As noted in the Gartner Hype Cycle for Finance Data and Analytics Governance, 2023, “Through. The post My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023 appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Finance

Finance Digital Transformation Analytics Data Integration

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Loading complex multi-point datasets into a dimensional model, identifying issues, and validating data integrity of the aggregated and merged data points are the biggest challenges that clinical quality management systems face. With hybrid data warehousing, scalability is multiplied.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

Data is the fuel of the digital economy, so data-centric organizations have a distinct advantage. To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is.

Enterprise

Enterprise Management Strategy Data Lake

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

Data is the fuel of the digital economy, so data-centric organizations have a distinct advantage. To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is.

Management

Management Enterprise Strategy IT

Denodo Joins Forces with Presto

Data Virtualization

JUNE 22, 2023

The Denodo Platform is a logical data management platform, powered by. The post Denodo Joins Forces with Presto appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Integration

Data Integration Management Data Lake Metadata

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

Data is the fuel of the digital economy, so data-centric organizations have a distinct advantage. To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is.

Management

Management Enterprise Strategy IT

The Data Warehouse is Dead, Long Live the Data Warehouse, Part I

Data Virtualization

OCTOBER 18, 2022

The post The Data Warehouse is Dead, Long Live the Data Warehouse, Part I appeared first on Data Virtualization blog - Data Integration and Modern Data Management Articles, Analysis and Information. In times of potentially troublesome change, the apparent paradox and inner poetry of these.

Data Warehouse

Data Warehouse ROI Data Integration Internet of Things

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

AWS Big Data

FEBRUARY 2, 2023

Since its launch in 2006, Amazon Simple Storage Service (Amazon S3) has experienced major growth, supporting multiple use cases such as hosting websites, creating data lakes, serving as object storage for consumer applications, storing logs, and archiving data. AWS Billing and Cost Management – Create Cost and Usage Reports.

Reporting

Reporting Data Lake Management Optimization

Strategically Approaching Graph Technologies

Ontotext

FEBRUARY 26, 2024

Firstly, on the data maturity spectrum, the vast majority of organizations I’ve spoken with are stuck in the information stage. They have massive amounts of data they’re collecting and storing in their relational databases, document stores, data lakes, and data warehouses. Let’s summarize very quickly.

Technology

Technology Cost-Benefit Data-driven Metadata

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Andrew White

OCTOBER 22, 2021

This time, since we were virtual I only managed to close out the week with the 1-1 summary. Lakehouse (data warehouse and data lake working together) 8. Data Literacy, training, coordination, collaboration 8. Data Management Infrastructure/Data Fabric 5. Data Integration tactics 4.

IT

IT Data Lake Strategy Data Science

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

One notable trend in the streaming solutions market is the widespread use of Apache Kafka for data ingestion and Apache Spark for streaming processing across industries. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed Apache Kafka service that offers a seamless way to ingest and process streaming data.

Management

Management Metadata Testing Internet of Things

Choosing a Data Catalog: Data Map or Data Delivery App?

Data Virtualization

NOVEMBER 17, 2022

The idea seems, on the face of it, easy to understand: a data catalog is simply a centralized inventory of the data assets within an organization. Data catalogs also seek to be the. The post Choosing a Data Catalog: Data Map or Data Delivery App?

Data Integration

Data Integration Management Data Lake IT

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. Granted, I’m no expert in DG.

Data Governance

Data Governance Machine Learning Metadata Big Data

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Rich metadata and semantic modeling continue to drive the matching of 50K training materials to specific curricula, leading new, data-driven, audience-based marketing efforts that demonstrate how the recommender service is achieving increased engagement and performance from over 2.3 million users.

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

Data management platform definition A data management platform (DMP) is a suite of tools that helps organizations to collect and manage data from a wide array of first-, second-, and third-party sources and to create reports and build customer profiles as part of targeted personalization campaigns.

Management

Management Advertising Data Lake Sales

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

A data management platform (DMP) is a group of tools designed to help organizations collect and manage data from a wide array of sources and to create reports that help explain what is happening in those data streams. Deploying a DMP can be a great way for companies to navigate a business world dominated by data.

Management

Management Advertising Data Lake Sales

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

Gartner calls data fabric the Future of Data Management 1. Thoughtworks says data mesh is key to moving beyond a monolithic data lake. Spoiler alert: data fabric and data mesh are independent design concepts that are, in fact, quite complementary. Gartner on Data Fabric.

Data Lake

Data Lake Metadata Data-driven Data Governance

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. We chatted about industry trends, why decentralization has become a hot topic in the data world, and how metadata drives many data-centric use cases.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Salesforce debuts Zero Copy Partner Network to ease data integration

Webinars

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

How Knowledge Graphs Power Data Mesh and Data Fabric

How Cargotec uses metadata replication to enable cross-account data sharing

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Introducing Apache Hudi support with AWS Glue crawlers

Data governance in the age of generative AI

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Doing Cloud Migration and Data Governance Right the First Time

Create an end-to-end data strategy for Customer 360 on AWS

Unlocking the value of data as your differentiator

Data architecture strategy for data quality

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Five benefits of a data catalog

What is an Information Steward, and Why You Should Care

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

A hybrid approach in healthcare data warehousing with Amazon Redshift

Data Management Challenges for the Modern Enterprise

Data Management Challenges for the Modern Enterprise

Denodo Joins Forces with Presto

Data Strategies for Getting Greater Business Value from Distributed Data

Data Management Challenges for the Modern Enterprise

The Data Warehouse is Dead, Long Live the Data Warehouse, Part I

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

Strategically Approaching Graph Technologies

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Choosing a Data Catalog: Data Map or Data Delivery App?

Themes and Conferences per Pacoid, Episode 8

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Augmented data management: Data fabric versus data mesh

Top 15 data management platforms available today

Top 15 data management platforms

Data Mesh vs. Data Fabric: A Love Story

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Stay Connected