Data Lake, Enterprise and Metadata

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Businesses are constantly evolving, and data leaders are challenged every day to meet new requirements. For many enterprises and large organizations, it is not feasible to have one processing engine or tool to deal with the various business requirements. This post is co-written with Andries Engelbrecht and Scott Teal from Snowflake.

Data Lake

Data Lake Snapshot Metadata Data Architecture

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

AWS Big Data

MARCH 4, 2024

As enterprises collect increasing amounts of data from various sources, the structure and organization of that data often need to change over time to meet evolving analytical needs. This is critical for fast-moving enterprises to augment data structures to support new use cases.

Snapshot

Snapshot Data Lake Metadata Recreation/Entertainment

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

“The challenge that a lot of our customers have is that requires you to copy that data, store it in Salesforce; you have to create a place to store it; you have to create an object or field in which to store it; and then you have to maintain that pipeline of data synchronization and make sure that data is updated,” Carlson said.

Data Integration

Data Integration Data Lake Metadata Data Warehouse

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Data ecosystems have become jungles and in spite of all the technology, data teams are struggling to create a modern data experience. Drowning in Data, Thirsting for Context We’ve heard the saying, “Data, data everywhere. ” As more data accumulates, context gets diluted and lost.

Metadata

Metadata Data Lake Data Warehouse Data Quality

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Integrating Data Governance and Enterprise Architecture

erwin

SEPTEMBER 3, 2020

Why should you integrate data governance (DG) and enterprise architecture (EA)? Two of the biggest challenges in creating a successful enterprise architecture initiative are: collecting accurate information on application ecosystems and maintaining the information as application ecosystems change.

Data Governance

Data Governance Enterprise Risk Data Lake

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

The sheer scale of data being captured by the modern enterprise has necessitated a monumental shift in how that data is stored. What was at first a data stream has morphed into a data river as enterprise businesses are harvesting reams of data from every conceivable input across every conceivable business function.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. EDLS job steps and metadata Every EDLS job comprises one or more job steps chained together and run in a predefined order orchestrated by the custom ETL framework.

Metadata

Metadata Data Lake Visualization Data Transformation

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Storage-centric approach In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy. Another complication was the existing data silos.

Metadata

Metadata Data Lake Data-driven Enterprise

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. With this massive data growth, data proliferation across your data stores, data warehouse, and data lakes can become equally challenging.

Data Lake

Data Lake Analytics Dashboards Metrics

Where Do Data Catalogs Fit in Metadata Management?

Alation

FEBRUARY 13, 2020

In an earlier blog, I defined a data catalog as “a collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need, serves as an inventory of available data, and provides information to evaluate fitness data for intended uses.”.

Metadata

Metadata Management Data Lake Data Governance

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. But first, let’s define the data mesh design pattern. The past decades of enterprise data platform architectures can be summarized in 69 words.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

Implementing the right data strategy spurs innovation and outstanding business outcomes by recognizing data as a critical asset that provides insights for better and more informed decision-making. By taking advantage of data, enterprises can shape business decisions, minimize risk for stakeholders, and gain competitive advantage.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Informatica’s new data management clouds target health, finance services

CIO Business Intelligence

MAY 24, 2022

Just after launching a focused data management platform for retail customers in March, enterprise data management vendor Informatica has now released two more industry-specific versions of its Intelligent Data Management Cloud (IDMC) — one for financial services, and the other for health and life sciences.

Finance

Finance Management Metadata Data Quality

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Machine Learning Cost-Benefit

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect role Data architects are senior visionaries who translate business requirements into technology requirements and define data standards and principles, often in support of data or digital transformations. Data architects are frequently part of a data science team and tasked with leading data system projects.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

By using automated and repeatable capabilities, you can quickly and safely migrate data to the cloud and govern it along the way. But transforming and migrating enterprise data to the cloud is only half the story – once there, it needs to be governed for completeness and compliance. GDPR, CCPA, HIPAA, SOX, PIC DSS).

Data Governance

Data Governance Metadata Testing Data Lake

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

Analytics reference architecture for gaming organizations In this section, we discuss how gaming organizations can use a data hub architecture to address the analytical needs of an enterprise, which requires the same data at multiple levels of granularity and different formats, and is standardized for faster consumption.

Analytics

Analytics Data Warehouse Data Lake Metadata

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

Use Case #1: Customer 360 / Enterprise 360 Customer data is typically spread across multiple applications, departments, and regions. Each team and system need to keep diverse sets of data about their customers in order to play their specific role – inadvertently leading to siloed experiences. million users.

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

AWS Big Data

APRIL 19, 2023

We split the solution into two primary components: generating Spark job metadata and running the SQL on Amazon EMR. The first component (metadata setup) consumes existing Hive job configurations and generates metadata such as number of parameters, number of actions (steps), and file formats. sql_path SQL file name.

Metadata

Metadata Testing Data Lake Consulting

How Cloudera Supports Zero Trust for Data

Cloudera

JUNE 7, 2023

Agencies should inventory, categorize, and label data; protect data at rest and in transit; and deploy mechanisms to detect and stop data exfiltration. Agencies should carefully craft and review data governance policies to ensure all data lifecycle security aspects are appropriately enforced across the enterprise.”

Metadata

Metadata Data Lake Optimization Modeling

Data Governance Makes Data Security Less Scary

erwin

OCTOBER 31, 2019

Data security incidents may be the result of not having a true data governance foundation that makes it possible to understand the context of data – what assets exist and where, the relationship between them and enterprise systems and processes, and how and by what authorized parties data is used. No Hocus Pocus.

Data Governance

Data Governance Metadata Risk Data Lake

Putting the Business Back Into Business Innovation

Timo Elliott

DECEMBER 14, 2022

Gartner calls it the Composable Enterprise , for example – it’s about having a solid information foundation that enables fast and flexible creation of what they call composable applications that allow you to create new applications and workflows by just bringing together modular components. Business Process.

Data Lake

Data Lake Recreation/Entertainment Metadata Data Warehouse

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

An enterprise data catalog does all that a library inventory system does – namely streamlining data discovery and access across data sources – and a lot more. For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance.

Metadata

Metadata Data Quality Data-driven Data Governance

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. Curated foundation models, such as those created by IBM or Microsoft, help enterprises scale and accelerate the use and impact of the most advanced AI capabilities using trusted data.

Risk

Risk Modeling Management Metadata

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Clarifying what data mesh is and how it works is important, but even more so is starting with the why. Why Organizations Need A Data Mesh The data mesh paradigm aims to address some of the biggest pain points for most organizations. This becomes particularly problematic when dealing with veracity of data.

Metadata

Metadata Data-driven Data Quality Data Architecture

Our Next Phase of Growth: Enterprise Data Catalogs

Alation

FEBRUARY 13, 2020

Following a very successful year of growth in Alation’s business, this announcement marks a milestone for Alation and the enterprise data catalog market. What started six years ago as one startup trying to improve the way people work with data has become a full-blown market category – Machine Learning Data Catalogs.

Enterprise

Enterprise Data Lake Machine Learning Data-driven

Operational Database Security – Part 2

Cloudera

SEPTEMBER 23, 2020

Comprehensive auditing is provided to enable enterprises to effectively and efficiently meet their compliance requirements by auditing access and other types of operations across OpDB (through HBase). Both fine-grained access control of database objects and access to metadata is provided. Data Catalog . Activity monitoring.

Data Lake

Data Lake Metadata IoT Enterprise

Four Topics That Should Be Top of Mind for SAP Partners

Timo Elliott

JUNE 19, 2023

So analysts like Gartner talk about the composable enterprise, where we can create new end-to-end business workflows in a more modular, agile, and flexible way simply by putting together LEGO bricks, if you like, to create end-to-end workflows and processes. The next area is data. There’s a huge disruption around data.

Data Lake

Data Lake Digital Transformation Recreation/Entertainment Technology

What Is Data Curation?

Alation

FEBRUARY 13, 2020

Data curation is important in today’s world of data sharing and self-service analytics, but I think it is a frequently misused term. When speaking and consulting, I often hear people refer to data in their data lakes and data warehouses as curated data, believing that it is curated because it is stored as shareable data.

Metadata

Metadata Data Warehouse Data Lake Data Governance

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. The company wanted the ability to continue processing operational data in the secondary Region in the rare event of primary Region failure.

Data Lake

Data Lake Data Processing Metadata Snapshot

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues. Several factors determine the quality of your enterprise data like accuracy, completeness, consistency, to name a few.

Data Quality

Data Quality Data Architecture Strategy Data Lake

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Why do we need a data catalog? What does a data catalog do? These are all good questions and a logical place to start your data cataloging journey. Data catalogs have become the standard for metadata management in the age of big data and self-service analytics. Figure 1 – Data Catalog Metadata Subjects.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

Data Management Challenges for the Modern Enterprise

Data Virtualization

MARCH 3, 2021

To remain competitive, organizations must have a data management strategy in place to effectively ingest, store, organize, and analyze data while ensuring that it is. The post Data Management Challenges for the Modern Enterprise appeared first on Data Virtualization blog.

Enterprise

Enterprise Management Strategy Data Lake

Data Mesh 101: How Data Mesh Can Be Used in an Organization

Ontotext

FEBRUARY 12, 2024

As discussed in part one, the data mesh paradigm is still a relatively new concept with implementations in the early stages. Most enterprises have years of legacy systems, processes, and practices that make it more demanding to quickly jump onto the data mesh bandwagon. For instance, JPMorgan Chase & Co.

Data Quality

Data Quality Data-driven Data Lake Data Governance

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

With more companies increasingly migrating their data to the cloud to ensure availability and scalability, the risks associated with data management and protection also are growing. Data Security Starts with Data Governance. Do You Know Where Your Sensitive Data Is?

Data Governance

Data Governance Cost-Benefit Risk Metadata

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Webinars

Trending Sources

Use Apache Iceberg in a data lake to support incremental data processing

Webinars

Use AWS Glue ETL to perform merge, partition evolution, and schema evolution on Apache Iceberg

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Salesforce debuts Zero Copy Partner Network to ease data integration

Build a real-time GDPR-aligned Apache Iceberg data lake

How Knowledge Graphs Power Data Mesh and Data Fabric

Data Lakes on Cloud & it’s Usage in Healthcare

Integrating Data Governance and Enterprise Architecture

Data governance in the age of generative AI

Data Lakes: What Are They and Who Needs Them?

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Case study: Policy Enforcement Automation With Semantics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Where Do Data Catalogs Fit in Metadata Management?

What is a Data Mesh?

Overcome these six data consumption challenges for a more data-driven enterprise

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Informatica’s new data management clouds target health, finance services

The Future of the Data Lakehouse – Open

The Future of the Data Lakehouse – Open

What is a data architect? Skills, salaries, and how to become a data framework master

Doing Cloud Migration and Data Governance Right the First Time

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

How Cloudera Supports Zero Trust for Data

Data Governance Makes Data Security Less Scary

Putting the Business Back Into Business Innovation

Five benefits of a data catalog

How to use foundation models and trusted governance to manage AI workflow risk

What is Data Mesh?

Our Next Phase of Growth: Enterprise Data Catalogs

Operational Database Security – Part 2

Four Topics That Should Be Top of Mind for SAP Partners

What Is Data Curation?

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Data architecture strategy for data quality

What Is a Data Catalog?

Data Management Challenges for the Modern Enterprise

Data Mesh 101: How Data Mesh Can Be Used in an Organization

How Data Governance Protects Sensitive Data

Stay Connected