Data Lake, Metadata and Technology

Data Lake

Metadata

Technology

Understanding the Differences Between Data Lakes and Data Warehouses

Smart Data Collective

AUGUST 28, 2021

Data lakes and data warehouses are probably the two most widely used structures for storing data. Data Warehouses and Data Lakes in a Nutshell. A data warehouse is used as a central storage space for large amounts of structured data coming from various sources. Data Type and Processing.

Data Lake

Data Lake Data Warehouse Unstructured Data Structured Data

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

Data lakes are a popular choice for today’s organizations to store their data around their business activities. As a best practice of a data lake design, data should be immutable once stored. A data lake built on AWS uses Amazon Simple Storage Service (Amazon S3) as its primary storage environment.

Data Lake

Data Lake Metadata Testing Data Warehouse

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Salesforce debuts Zero Copy Partner Network to ease data integration

CIO Business Intelligence

APRIL 25, 2024

At Salesforce World Tour NYC today, Salesforce unveiled a new global ecosystem of technology and solution providers geared to help its customers leverage third-party data via secure, bidirectional zero-copy integrations with Salesforce Data Cloud. It works in Salesforce just like any other native Salesforce data,” Carlson said.

Data Integration

Data Integration Data Lake Metadata Data Warehouse

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

The data ecosystem today is crowded with dazzling buzzwords, all fighting for investment dollars. A survey in 2021 found that a data company was being funded every 45 minutes. Data ecosystems have become jungles and in spite of all the technology, data teams are struggling to create a modern data experience.

Metadata

Metadata Data Lake Data Warehouse Data Quality

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

AWS Big Data

DECEMBER 13, 2023

For the past 5 years, BMS has used a custom framework called Enterprise Data Lake Services (EDLS) to create ETL jobs for business users. EDLS job steps and metadata Every EDLS job comprises one or more job steps chained together and run in a predefined order orchestrated by the custom ETL framework.

Metadata

Metadata Data Lake Visualization Data Transformation

Introducing AWS Glue crawler and create table support for Apache Iceberg format

AWS Big Data

AUGUST 16, 2023

Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback. Iceberg captures metadata information on the state of datasets as they evolve and change over time. Choose Create.

Data Lake

Data Lake Metadata Snapshot Management

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

AWS Big Data

AUGUST 3, 2023

Today’s modern data lakes span multiple accounts, AWS Regions, and lines of business in organizations. It’s important that their data solution gives them the ability to share and access data securely and safely across Regions. A resource link is a Data Catalog object that is a link to a database or table.

Data Lake

Data Lake Metadata Management Data Processing

Gartner Data & Analytics Sydney 2022

Timo Elliott

NOVEMBER 21, 2022

For the last 30 years, whenever you want to do analytics, the first step is to rip it out of the operational applications and try and move it to a different environment—so data warehousing, data lakes, data lakehouses and now data clouds. Data is useless.

Data Analytics

Data Analytics Analytics Recreation/Entertainment Data Lake

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

They are expected to understand the entire data landscape and generate business-moving insights while facing the voracious needs of different teams and the constraints of technology architecture and compliance. Evolution of data approaches The data strategies we’ve had so far have led to a lot of challenges and pain points.

Metadata

Metadata Data Lake Data-driven Enterprise

Query your Apache Hive metastore with AWS Lake Formation permissions

AWS Big Data

JULY 20, 2023

The Hive metastore is a repository of metadata about the SQL tables, such as database names, table names, schema, serialization and deserialization information, data location, and partition details of each table. Apache Hive, Apache Spark, Presto, and Trino can all use a Hive Metastore to retrieve metadata to run queries.

Data Lake

Data Lake Metadata Data Processing Big Data

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect role Data architects are senior visionaries who translate business requirements into technology requirements and define data standards and principles, often in support of data or digital transformations.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Putting the Business Back Into Business Innovation

Timo Elliott

DECEMBER 14, 2022

The future is enabled by technology, but it’s not about the technical infrastructures: it’s about optimizing end-to-end processes, business capabilities, and business ecosystems. And that’s where SAP Business Technology Platform (SAP BTP) comes in. The analysts call this a data mesh or data fabric strategy.

Data Lake

Data Lake Recreation/Entertainment Metadata Data Warehouse

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics. ChatGPT is an excellent resource for gaining high-level insights and building awareness of any technology.

Unstructured Data

Unstructured Data Data Lake Data Warehouse Machine Learning

Four Topics That Should Be Top of Mind for SAP Partners

Timo Elliott

JUNE 19, 2023

Technologies like SAP BTP allow us to do that more easily than ever before in the cloud environments, even for customers that have on-premise applications. Well, this is where technologies like Signavio are a great fit. And these generative AI technologies allow new possibilities for creativity.

Data Lake

Data Lake Digital Transformation Recreation/Entertainment Technology

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference. Starting with Amazon EMR release 6.7.0,

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Doing Cloud Migration and Data Governance Right the First Time

erwin

OCTOBER 8, 2020

The desire to modernize technology, over time, leads to acquiring many different systems with various data entry points and transformation rules for data as it moves into and across the organization. Regulatory compliance is also a major driver of data governance (e.g., GDPR, CCPA, HIPAA, SOX, PIC DSS).

Data Governance

Data Governance Metadata Testing Data Lake

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Why do we need a data catalog? What does a data catalog do? These are all good questions and a logical place to start your data cataloging journey. Data catalogs have become the standard for metadata management in the age of big data and self-service analytics. Figure 1 – Data Catalog Metadata Subjects.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

AWS Big Data

AUGUST 31, 2023

Amazon Redshift enables you to directly access data stored in Amazon Simple Storage Service (Amazon S3) using SQL queries and join data across your data warehouse and data lake. With Amazon Redshift, you can query the data in your S3 data lake using a central AWS Glue metastore from your Redshift data warehouse.

Data Lake

Data Lake Data Warehouse Metadata Data Architecture

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Next generation of big data platforms and long running batch jobs operated by a central team of data engineers have often led to data lake swamps. Monitor and identify data quality issues closer to the source to mitigate the potential impact on downstream processes or workloads.

Data Quality

Data Quality Data Architecture Strategy Data Lake

What Is Data Curation?

Alation

FEBRUARY 13, 2020

Data curation is important in today’s world of data sharing and self-service analytics, but I think it is a frequently misused term. When speaking and consulting, I often hear people refer to data in their data lakes and data warehouses as curated data, believing that it is curated because it is stored as shareable data.

Metadata

Metadata Data Warehouse Data Lake Data Governance

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

As more businesses use AI systems and the technology continues to mature and change, improper use could expose a company to significant financial, operational, regulatory and reputational risks. It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits.

Risk

Risk Modeling Management Metadata

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

CIO Business Intelligence

APRIL 29, 2022

As someone who is passionate about the transformative power of technology, it is fascinating to see intelligent computing – in all its various guises – bridge the schism between fantasy and reality. This first article emphasizes data as the ‘foundation-stone’ of AI-based initiatives. Establishing a Data Foundation.

Data Governance

Data Governance IT Risk Data Lake

Are Data Lakehouses Secure and the Best of Both Worlds?

TDAN

MAY 31, 2022

As we enter a new cloud-first era, advancements in technology have helped companies capture and capitalize on data as much as possible. Deciding between which cloud architecture to use has always been a debate between two options: data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse Technology Data Architecture

Data Governance Makes Data Security Less Scary

erwin

OCTOBER 31, 2019

What data do we have and where is it? Data is a critical asset used to operate, manage and grow a business. While sometimes at rest in databases, data lakes and data warehouses; a large percentage is federated and integrated across the enterprise, introducing governance, manageability and risk issues that must be managed.

Data Governance

Data Governance Metadata Risk Data Lake

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Think of data mesh as an operational mode for organizations with a domain-driven, decentralized data architecture. It’s a combination of implementation, organizational patterns, and a technology-agnostic set of principles. What Is a Data Product and Who Owns Them?

Metadata

Metadata Data-driven Data Quality Data Architecture

Data Mesh 101: How Data Mesh Can Be Used in an Organization

Ontotext

FEBRUARY 12, 2024

How to Adopt Embarking on a data mesh journey is a significant undertaking that requires careful planning and consideration of culture, processes, technology, and governance. Ensure a successful implementation by simplifying the access, use, and publishing of data products. For instance, JPMorgan Chase & Co.

Data Quality

Data Quality Data-driven Data Lake Data Governance

Modernizing Data Architectures

Data Virtualization

AUGUST 26, 2020

Recently, we have seen the rise of new technologies like big data, the Internet of things (IoT), and data lakes. But we have not seen many developments in the way that data gets delivered. Modernizing the data infrastructure is the.

Data Architecture

Data Architecture Internet of Things Data Lake IoT

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2024

Prior to this integration, you had to complete the following steps before Amazon DataZone could treat the published Data Catalog table as a managed asset: Identity the Amazon S3 location associated with Data Catalog table. Publish the table metadata to the Amazon DataZone business data catalog.

Finance

Finance Sales Publishing Metadata

Exploring real-time streaming for generative AI Applications

AWS Big Data

MARCH 25, 2024

Streaming jobs constantly ingest new data to synchronize across systems and can perform enrichment, transformations, joins, and aggregations across windows of time more efficiently. Studio notebooks seamlessly combine these technologies to make advanced analytics on data streams accessible to developers of all skill sets.

Data Lake

Data Lake Unstructured Data Management Modeling

A Few 2016 Technology Predictions

In(tegrate) the Clouds

DECEMBER 21, 2015

I enjoy the end of the year technology predictions, even though it’s hard to argue with this tweet from Merv Adrian: By 2016, 99% of readers will be utterly sick of predictions. 2016 will be the year of the data lake. I can’t help it. Merv Adrian (@merv) December 19, 2015. platform.twitter.com/widgets.js.

Technology

Technology Internet of Things Digital Transformation Software

Advancing AI: The emergence of a modern information lifecycle

CIO Business Intelligence

DECEMBER 4, 2023

A modern information lifecycle management approach Today’s ILM approach recognizes the enterprise value of all digitized and enriched assets , avoiding the habituated, narrow reliance ontraditional structured data. Beyond “records,” organizations can digitally capture anything and apply metadata for context and searchability.

Unstructured Data

Unstructured Data Data Lake Metadata Business Objectives

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Technical metadata to describe schemas, indexes and other database objects.

Metadata

Metadata Data Quality Data-driven Data Governance

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

Cloudera Data Platform (CDP) scored among the top 10 vendors on all four Analytical Use Cases — Data Warehouse, Logical Data Warehouse, Data Lake and Operational Intelligence in the Critical Capabilities for Cloud Database Management Systems for Analytics Use Cases. and/or its affiliates in the U.S.

Reporting

Reporting Data Warehouse Data Lake Machine Learning

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

CIO Business Intelligence

SEPTEMBER 6, 2022

Behind the flagship brand, though, he says data remained scattered in siloes across many legacy business units and applications, with limited automation, many glossaries, and complex data lineage, and stewardship making it hard to govern and audit. A lot of roles in data just talk about the data,” he says. “A

IT Forecasting Data Lake Enterprise

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises.

Data Lake

Data Lake Data Warehouse Data mining Statistics

Data Profiling: What It Is and How to Perfect It

Alation

APRIL 18, 2023

What is data profiling? Definition and purpose of data profiling Data profiling is the process of analyzing and assessing the quality, structure, and content of data. Data profiling technology examines data values, formats, relationships, and patterns to identify data quality issues, dependencies, and relationships.

IT Metadata Data Quality Data Governance

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Snapshot Cost-Benefit

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

In fact, DataOps (and its underlying foundation, Data Fabric ), has ranked as a top area of inquiry and interest with analysts over the past twelve months. DataOps requires an array of technology to automate the design, development, deployment, and management of data delivery, with governance sprinkled on for good measure.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

How Data Governance Protects Sensitive Data

erwin

APRIL 2, 2021

And knowing the business purpose translates into actively governing personal data against potential privacy and security violations. Do You Know Where Your Sensitive Data Is? Data is a valuable asset used to operate, manage and grow a business. Minimizing Risk Exposure with Data Intelligence.

Data Governance

Data Governance Cost-Benefit Risk Metadata

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

A typical organization’s data landscape consists of a large number of data stores across workflows, business processes and business units, including but not limited to data warehouses, data marts, data lakes, ODS, cloud data stores, and CRM databases. The volume of data assets.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Cloudera Contributor: Mark Ramsey, PhD ~ Globally Recognized Chief Data Officer. July brings summer vacations, holiday gatherings, and for the first time in two years, the return of the Massachusetts Institute of Technology (MIT) Chief Data Officer symposium as an in-person event. Why should this be on their technology roadmap?

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

SumUp is a leading global financial technology company driven by the purpose of leveling the playing field for small businesses. Unless, of course, the rest of their data also resides in the Google Cloud. AWS Glue gave us a cost-efficient option to migrate the data and we further optimized storage cost by pruning cold data.

Analytics

Analytics Data Lake Testing Optimization

Understanding the Differences Between Data Lakes and Data Warehouses

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Build a real-time GDPR-aligned Apache Iceberg data lake

Webinars

Salesforce debuts Zero Copy Partner Network to ease data integration

How Knowledge Graphs Power Data Mesh and Data Fabric

Modernize your ETL platform with AWS Glue Studio: A case study from BMS

Introducing AWS Glue crawler and create table support for Apache Iceberg format

Configure cross-Region table access with the AWS Glue Catalog and AWS Lake Formation

Gartner Data & Analytics Sydney 2022

Case study: Policy Enforcement Automation With Semantics

Query your Apache Hive metastore with AWS Lake Formation permissions

What is a data architect? Skills, salaries, and how to become a data framework master

Putting the Business Back Into Business Innovation

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Educating ChatGPT on Data Lakehouse

Four Topics That Should Be Top of Mind for SAP Partners

AWS Lake Formation 2022 year in review

Doing Cloud Migration and Data Governance Right the First Time

What Is a Data Catalog?

Query your Iceberg tables in data lake using Amazon Redshift (Preview)

Data architecture strategy for data quality

What Is Data Curation?

How to use foundation models and trusted governance to manage AI workflow risk

Don’t Fear Artificial Intelligence; Embrace it Through Data Governance

Are Data Lakehouses Secure and the Best of Both Worlds?

Data Governance Makes Data Security Less Scary

What is Data Mesh?

Data Mesh 101: How Data Mesh Can Be Used in an Organization

Modernizing Data Architectures

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog

Exploring real-time streaming for generative AI Applications

A Few 2016 Technology Predictions

Advancing AI: The emergence of a modern information lifecycle

Five benefits of a data catalog

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

Convergent Evolution

Data Profiling: What It Is and How to Perfect It

Power enterprise-grade Data Vaults with Amazon Redshift – Part 2

Turnkey Cloud DataOps: Solution from Alation and Accenture

How Data Governance Protects Sensitive Data

Overcome these six data consumption challenges for a more data-driven enterprise

Demystifying Modern Data Platforms

How SumUp made digital analytics more accessible using AWS Glue

Stay Connected