Analytics, Data Integration, Data Lake and Metadata

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

AWS Big Data

JULY 29, 2024

In the era of big data, data lakes have emerged as a cornerstone for storing vast amounts of raw data in its native format. They support structured, semi-structured, and unstructured data, offering a flexible and scalable environment for data ingestion from multiple sources.

Metadata

Metadata Snapshot Data Lake Metrics

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

They understand that a one-size-fits-all approach no longer works, and recognize the value in adopting scalable, flexible tools and open data formats to support interoperability in a modern data architecture to accelerate the delivery of new solutions.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Webinars

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Improving the Accuracy of Generative AI Systems: A Structured Approach

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Marketing Operations in 2025: A New Framework for Success

MORE WEBINARS

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

AWS Big Data

SEPTEMBER 10, 2024

We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. In such scenarios, data engineers face challenges in connecting and extracting data from storage containers on Microsoft Azure.

Data Lake

Data Lake Metadata Management Software

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. To address this challenge, organizations can deploy a data mesh using AWS Lake Formation that connects the multiple EMR clusters. An entity can act both as a producer of data assets and as a consumer of data assets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Data Virtualization

APRIL 21, 2022

Reading Time: 3 minutes First we had data warehouses, then came data lakes, and now the new kid on the block is the data lakehouse. But what is a data lakehouse and why should we develop one? In a way, the name describes what.

Data Lake

Data Lake Data Warehouse Data Integration Management

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

In addition to using native managed AWS services that BMS didn’t need to worry about upgrading, BMS was looking to offer an ETL service to non-technical business users that could visually compose data transformation workflows and seamlessly run them on the AWS Glue Apache Spark-based serverless data integration engine.

Metadata

Metadata Data Lake Visualization Data Transformation

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

In most enterprises data teams lack a data map and data asset inventory and are often unaware of data that exists across the organization, its associated profile, quality and associated metadata. Teams can’t access data to build their business use cases. For example, a product data tag is basic metadata.

Metadata

Metadata Data Lake Data Warehouse Data Quality

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

AWS Big Data

JUNE 25, 2024

In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights. It provides secure, real-time access to Redshift data without copying, keeping enterprise data in place.

Data Lake

Data Lake Cost-Benefit Data-driven Data Warehouse

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction).

Data Governance

Data Governance Unstructured Data Metadata Data Lake

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Data Virtualization

MARCH 28, 2024

As noted in the Gartner Hype Cycle for Finance Data and Analytics Governance, 2023, “Through. The post My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023 appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Finance

Finance Digital Transformation Analytics Data Integration

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Next generation of big data platforms and long running batch jobs operated by a central team of data engineers have often led to data lake swamps. Monitor and identify data quality issues closer to the source to mitigate the potential impact on downstream processes or workloads.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Technical metadata to describe schemas, indexes and other database objects.

Metadata

Metadata Data Quality Data-driven Data Governance

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Tens of thousands of customers today use Amazon Redshift to analyze exabytes of data and run analytical queries, making it the most widely used cloud data warehouse. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. With these insights, teams have the visibility to make data integration pipelines more efficient. An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog.

Metrics

Metrics Visualization Dashboards Interactive

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

AWS Big Data

AUGUST 15, 2024

Unlocking the true value of data often gets impeded by siloed information. Traditional data management—wherein each business unit ingests raw data in separate data lakes or warehouses—hinders visibility and cross-functional analysis. Amazon DataZone natively supports data sharing for Amazon Redshift data assets.

Data Lake

Data Lake Data Warehouse Data Governance Publishing

Navigating the New Data Landscape: Trends and Opportunities

Data Virtualization

JUNE 19, 2024

Reading Time: 5 minutes The data landscape has evolved and become more complex as organizations recognize the need to leverage data and analytics. Generative artificial intelligence has further put pressure on organizations to manage this complexity. At TDWI, we see companies collecting traditional structured.

Data Integration

Data Integration Management Analytics Data Architecture

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Andrew White

OCTOBER 22, 2021

Analytics Tactics (known outcome/known data/BI/analytics v unknown outcome/unknown data/data science/ML) 11. Data Hub Strategy 10. Lakehouse (data warehouse and data lake working together) 8. Data Literacy, training, coordination, collaboration 8. Data Integration tactics 4.

IT

IT Data Lake Strategy Data Science

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This data needs to be ingested into a data lake, transformed, and made available for analytics, machine learning (ML), and visualization. For this, Cargotec built an Amazon Simple Storage Service (Amazon S3) data lake and cataloged the data assets in AWS Glue Data Catalog.

Metadata

Metadata Data Lake Machine Learning Big Data

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

AWS Big Data

FEBRUARY 6, 2023

For data lake customers who need to discover petabytes of data, AWS Glue crawlers are a popular way to discover and catalog data in the background. This allows users to search and find relevant data from multiple data sources. In the crawler setup, you can select MongoDB as a data source.

Metadata

Metadata Data Lake Machine Learning Big Data

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

The Data Warehouse is Dead, Long Live the Data Warehouse, Part I

Data Virtualization

OCTOBER 18, 2022

The post The Data Warehouse is Dead, Long Live the Data Warehouse, Part I appeared first on Data Virtualization blog - Data Integration and Modern Data Management Articles, Analysis and Information. In times of potentially troublesome change, the apparent paradox and inner poetry of these.

Data Warehouse

Data Warehouse ROI Data Integration Internet of Things

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

This cloud service was a significant leap from the traditional data warehousing solutions, which were expensive, not elastic, and required significant expertise to tune and operate. Amazon Redshift Serverless, generally available since 2021, allows you to run and scale analytics without having to provision and manage the data warehouse.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

AWS Big Data

FEBRUARY 1, 2023

With data volumes exhibiting a double-digit percentage growth rate year on year and the COVID pandemic disrupting global logistics in 2021, it became more critical to scale and generate near-real-time data. You can use it for analytics, ML, and application development. Then we chose Amazon Athena as our query service.

Optimization

Optimization Forecasting Data Lake Metadata

Choosing a Data Catalog: Data Map or Data Delivery App?

Data Virtualization

NOVEMBER 17, 2022

The idea seems, on the face of it, easy to understand: a data catalog is simply a centralized inventory of the data assets within an organization. Data catalogs also seek to be the. The post Choosing a Data Catalog: Data Map or Data Delivery App?

Data Integration

Data Integration Management Data Lake IT

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Gartner predicts that graph technologies will be used in 80% of data and analytics innovations by 2025, up from 10% in 2021. Graphs boost knowledge discovery and efficient data-driven analytics to understand a company’s relationship with customers and personalize marketing, products, and services. million users.

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. in lieu of simply landing in a data lake.

Data Governance

Data Governance Machine Learning Metadata Data Science

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Data Warehouse

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Gartner defines a data fabric as “a design concept that serves as an integrated layer of data and connecting processes. This approach breaks down data silos, allowing for new opportunities to shape data governance, data integration, single customer views and trustworthy AI implementations among other common industry use cases.

Management

Management Metadata Data Architecture Data Lake

Unlocking the value of data as your differentiator

AWS Big Data

NOVEMBER 29, 2023

You also need services to store data for analysis and machine learning (ML) like Amazon Simple Storage Service (Amazon S3). Customers have created hundreds of thousands of data lakes on Amazon S3. Amazon DataZone uses ML to automatically add metadata to your data catalog, making all of your data more discoverable.

Data Warehouse

Data Warehouse Data Lake Dashboards Data Integration

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

Thoughtworks says data mesh is key to moving beyond a monolithic data lake. Spoiler alert: data fabric and data mesh are independent design concepts that are, in fact, quite complementary. Thoughtworks says data mesh is key to moving beyond a monolithic data lake 2. Gartner on Data Fabric.

Data Lake

Data Lake Metadata Data-driven Data Governance

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

Figure 1: Apache Iceberg fits the next generation data architecture by abstracting storage layer from analytics layer while introducing net new capabilities like time-travel and partition evolution. #1: 1: Multi-function analytics . 3: Open Performance.

Metadata

Metadata Data Architecture Machine Learning Cost-Benefit

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

In this post, we discuss how you can use purpose-built AWS services to create an end-to-end data strategy for C360 to unify and govern customer data that address these challenges. We recommend building your data strategy around five pillars of C360, as shown in the following figure.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

The data warehouse and analytical data stores moved to the cloud and disaggregated into the data mesh. Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. This led me to Sanjeev Mohan.

Metadata

Metadata Data Warehouse Data Quality Data Lake

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

Data is the fuel of the digital economy, so data-centric organizations have a distinct advantage. To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is.

Enterprise

Enterprise Management Strategy Data Lake

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Loading complex multi-point datasets into a dimensional model, identifying issues, and validating data integrity of the aggregated and merged data points are the biggest challenges that clinical quality management systems face. It is a data modeling methodology designed for large-scale data warehouse platforms.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Metadata

What is an Information Steward, and Why You Should Care

Grooper

MARCH 5, 2020

Information stewards are the critical link for organizations committed to innovation and maximizing the effective use of data. This article is will help you understand the critical role of information stewardship as it relates to data and analytics. Haven’t heard the term “information steward” before?

Data Lake

Data Lake Metadata Data Quality Software

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JUNE 10, 2024

In this blog post, we dive into different data aspects and how Cloudinary breaks the two concerns of vendor locking and cost efficient data analytics by using Apache Iceberg, Amazon Simple Storage Service (Amazon S3 ), Amazon Athena , Amazon EMR , and AWS Glue.

Data Lake

Data Lake Metadata Snapshot Analytics

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

Data is the fuel of the digital economy, so data-centric organizations have a distinct advantage. To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is.

Management

Management Enterprise Strategy Data Lake

How data stores and governance impact your AI initiatives

IBM Big Data Hub

OCTOBER 12, 2023

Once trained and deployed, models also need reliable access to historical and real-time data to generate content, make recommendations, detect errors, send proactive alerts, etc. Among the tasks necessary for internal and external compliance is the ability to report on the metadata of an AI model. Learn more about IBM watsonx 1.

Cost-Benefit

Cost-Benefit Metadata Data Governance Modeling

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

Data is the fuel of the digital economy, so data-centric organizations have a distinct advantage. To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is.

Management

Management Enterprise Strategy Data Lake

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

AWS Big Data

JUNE 12, 2023

Organizations across the world are increasingly relying on streaming data, and there is a growing need for real-time data analytics, considering the growing velocity and volume of data being collected. For more information about checkpointing, see the appendix at the end of this post. Subramanya Vajiraya is a Sr.

Management

Management Metadata Testing Internet of Things

Monitoring Apache Iceberg metadata layer using AWS Lambda, AWS Glue, and AWS CloudWatch

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Webinars

Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

The Data Lakehouse: Blending Data Warehouses and Data Lakes

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

How Knowledge Graphs Power Data Mesh and Data Fabric

Access Amazon Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

Data governance in the age of generative AI

My Understanding of the Gartner® Hype Cycle™ for Finance Data and Analytics Governance, 2023

Data architecture strategy for data quality

Five benefits of a data catalog

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Seamless integration of data lake and data warehouse using Amazon Redshift Spectrum and Amazon DataZone

Navigating the New Data Landscape: Trends and Opportunities

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

How Cargotec uses metadata replication to enable cross-account data sharing

Introducing MongoDB Atlas metadata collection with AWS Glue crawlers

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Data Strategies for Getting Greater Business Value from Distributed Data

The Data Warehouse is Dead, Long Live the Data Warehouse, Part I

Introducing Apache Hudi support with AWS Glue crawlers

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How Amazon Devices scaled and optimized real-time demand and supply forecasts using serverless analytics

Choosing a Data Catalog: Data Map or Data Delivery App?

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Themes and Conferences per Pacoid, Episode 8

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Augmented data management: Data fabric versus data mesh

Unlocking the value of data as your differentiator

Data Mesh vs. Data Fabric: A Love Story

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Create an end-to-end data strategy for Customer 360 on AWS

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Data Management Challenges for the Modern Enterprise

A hybrid approach in healthcare data warehousing with Amazon Redshift

What is an Information Steward, and Why You Should Care

How Cloudinary transformed their petabyte scale streaming data lake with Apache Iceberg and AWS Analytics

Data Management Challenges for the Modern Enterprise

How data stores and governance impact your AI initiatives

Data Management Challenges for the Modern Enterprise

AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry

Stay Connected