Data Lake, Data Warehouse, Metadata and Publishing

Data Lake

Data Warehouse

Metadata

Publishing

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. The results demonstrate superior price performance of Cloudera Data Warehouse on the full set of 99 queries from the TPC-DS benchmark. Introduction.

Data Warehouse

Data Warehouse Cost-Benefit Consulting Interactive

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Trending Sources

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

MORE WEBINARS

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

Cloudera Data Platform (CDP) scored among the top 10 vendors on all four Analytical Use Cases — Data Warehouse, Logical Data Warehouse, Data Lake and Operational Intelligence in the Critical Capabilities for Cloud Database Management Systems for Analytics Use Cases. All rights reserved.

Reporting

Reporting Data Warehouse Data Lake Machine Learning

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2024

Amazon DataZone registers the location of these tables in Lake Formation using hybrid access mode, which allows managing permissions on AWS Glue tables through Lake Formation, while continuing to maintain any existing IAM permissions. Amazon DataZone enables you to publish any type of asset in the business data catalog.

Finance

Finance Sales Publishing Metadata

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

We have collected some of the key talks and solutions on data governance, data mesh, and modern data architecture published and presented in AWS re:Invent 2022, and a few data lake solutions built by customers and AWS Partners for easy reference.

Data Lake

Data Lake Data Governance Data Architecture Data Warehouse

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

Data Firehose uses an AWS Lambda function to transform data and ingest the transformed records into an Amazon Simple Storage Service (Amazon S3) bucket. An AWS Glue crawler scans data on the S3 bucket and populates table metadata on the AWS Glue Data Catalog. Choose Publish dashboard.

Metrics

Metrics Visualization Dashboards Interactive

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

In a data mesh, domains are represented by a node, which can be an operational data store (ODS), a data warehouse, or a data lake tailored to the domain’s requirements. Mesh emerges when teams use other domains’ data products and the domains communicate with others in a governed manner.

Metadata

Metadata Data-driven Data Quality Data Architecture

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

AWS Big Data

SEPTEMBER 13, 2023

A modern data architecture is an evolutionary architecture pattern designed to integrate a data lake, data warehouse, and purpose-built stores with a unified governance model. Of those tables, some are larger (such as in terms of record volume) than others, and some are updated more frequently than others.

Data Lake

Data Lake Data Processing Metadata Snapshot

Of Muffins and Machine Learning Models

Cloudera

FEBRUARY 16, 2022

In the case of CDP Public Cloud, this includes virtual networking constructs and the data lake as provided by a combination of a Cloudera Shared Data Experience (SDX) and the underlying cloud storage. Each project consists of a declarative series of steps or operations that define the data science workflow.

Machine Learning

Machine Learning Modeling Metadata Recreation/Entertainment

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Andrew White

OCTOBER 22, 2021

Lakehouse (data warehouse and data lake working together) 8. Data Literacy, training, coordination, collaboration 8. Data Management Infrastructure/Data Fabric 5. Data Integration tactics 4. Metadata Strategy 3. Figure 3: The Data and Analytics (infrastructure) Continuum.

IT Data Lake Strategy Data Science

Governing data in relational databases using Amazon DataZone

AWS Big Data

MAY 7, 2024

It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Note that a managed data asset is an asset for which Amazon DataZone can manage permissions.

Metadata

Metadata Data Lake Data Processing Data-driven

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

An Amazon DataZone domain contains an associated business data catalog for search and discovery, a set of metadata definitions to decorate the data assets that are used for discovery purposes, and data projects with integrated analytics and ML tools for users and groups to consume and publish data assets.

Metadata

Metadata Data Lake Publishing Data Governance

5 Ways Data Engineers Can Support Data Governance

Alation

JANUARY 26, 2023

Offer the right tools Data stewardship is greatly simplified when the right tools are on hand. So ask yourself, does your steward have the software to spot issues with data quality, for example? Do they have a system to manage the metadata for given assets? One example is the EU’s General Data Protection Regulation (GDPR).

Data Governance

Data Governance Strategy Data Quality Marketing

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

In this blog, we will share with you in detail how Cloudera integrates core compute engines including Apache Hive and Apache Impala in Cloudera Data Warehouse with Iceberg. We will publish follow up blogs for other data services. It allows us to independently upgrade the Virtual Warehouses and Database Catalogs.

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

Data platform trinity: Competitive or complementary?

IBM Big Data Hub

JANUARY 18, 2023

In another decade, the internet and mobile started the generate data of unforeseen volume, variety and velocity. It required a different data platform solution. Hence, Data Lake emerged, which handles unstructured and structured data with huge volume. Data lakehouse was created to solve these problems.

Data Lake

Data Lake Data Warehouse Data-driven Metadata

Themes and Conferences per Pacoid, Episode 8

Domino Data Lab

APRIL 3, 2019

Data governance shows up as the fourth-most-popular kind of solution that enterprise teams were adopting or evaluating during 2019. That’s a lot of priorities – especially when you group together closely related items such as data lineage and metadata management which rank nearby. in lieu of simply landing in a data lake.

Data Governance

Data Governance Machine Learning Metadata Big Data

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Gartner defines a data fabric as “a design concept that serves as an integrated layer of data and connecting processes. The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale.

Management

Management Metadata Data Architecture Data Lake

Data Mesh 101: What it is and Why You Should Care

Ontotext

FEBRUARY 12, 2024

In a centralized architecture, data is copied from source systems into a data lake or data warehouse to create a single source of truth serving analytics use cases. This quickly becomes difficult to scale with data discovery and data version issues, schema evolution, tight coupling, and a lack of semantic metadata.

IT Metadata Data Quality Data Lake

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Alation

APRIL 6, 2023

Other forms of governance address specific sets or domains of data including information governance (for unstructured data), metadata governance (for data documentation), and domain-specific data (master, customer, product, etc.). Data catalogs and spreadsheets are related in many ways.

Data Governance

Data Governance Metadata Cost-Benefit Structured Data

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

AWS Big Data

AUGUST 16, 2023

Iceberg manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. Most businesses store their critical data in a data lake, where you can bring data from various sources to a centralized storage.

Data Lake

Data Lake Metadata Testing Snapshot

Introducing watsonx: The future of AI for business

IBM Big Data Hub

MAY 9, 2023

With watsonx.data , businesses can quickly connect to data, get trusted insights and reduce data warehouse costs. A data store built on open lakehouse architecture, it runs both on premises and across multi-cloud environments. Savings may vary depending on configurations, workloads and vendors.

Data Warehouse

Data Warehouse Cost-Benefit Machine Learning Modeling

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

It is supported by querying, governance, and open data formats to access and share data across the hybrid cloud. Through workload optimization across multiple query engines and storage tiers, organizations can reduce data warehouse costs by up to 50 percent.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Top 15 data management platforms available today

CIO Business Intelligence

SEPTEMBER 22, 2023

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

Top 15 data management platforms

CIO Business Intelligence

JUNE 9, 2022

All this data arrives by the terabyte, and a data management platform can help marketers make sense of it all. Marketing-focused or not, DMPs excel at negotiating with a wide array of databases, data lakes, or data warehouses, ingesting their streams of data and then cleaning, sorting, and unifying the information therein.

Management

Management Advertising Data Lake Sales

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

This is the promise of the modern data lakehouse architecture. analyst Sumit Pal, in “Exploring Lakehouse Architecture and Use Cases,” published January 11, 2022: “Data lakehouses integrate and unify the capabilities of data warehouses and data lakes, aiming to support AI, BI, ML, and data engineering on a single platform.”

Metadata

Metadata Machine Learning Unstructured Data Data Lake

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

AWS Big Data

APRIL 28, 2023

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools.

Interactive

Interactive Metadata Data Warehouse Data-driven

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

Andrew White

JANUARY 11, 2021

This was for the Chief Data Officer, or head of data and analytics. Gartner also published the same piece of research for other roles, such as Application and Software Engineering. Data and Analytics Governance: Whats Broken, and What We Need To Do To Fix It. Link Data to Business Outcomes. Very interesting.

Data Analytics

Data Analytics Analytics Data-driven Finance

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This includes cleaning, aggregating, enriching, and restructuring data to fit the desired format. Load : Once data transformation is complete, the transformed data is loaded into the target system, such as a data warehouse, database, or another application.

Data Warehouse

Data Warehouse Reporting Data Transformation Sales

Do the Benefits of Cloud Outweigh the Costs?

Jet Global

SEPTEMBER 19, 2023

Data Access What insights can we derive from our cloud ERP? What are the best practices for analyzing cloud ERP data? Data Management How do we create a data warehouse or data lake in the cloud using our cloud ERP? How do I access the legacy data from my previous ERP?

Cost-Benefit

Cost-Benefit Data Warehouse Reporting Enterprise

Data Leaders Brief

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Webinars

Trending Sources

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Webinars

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog

AWS Lake Formation 2022 year in review

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

What is Data Mesh?

Simplify operational data processing in data lakes using AWS Glue and Apache Hudi

Of Muffins and Machine Learning Models

Week in the Life of an Analyst at Gartner US IT Symposium (virtual) 2021

Governing data in relational databases using Amazon DataZone

Unlock data across organizational boundaries using Amazon DataZone – now generally available

5 Ways Data Engineers Can Support Data Governance

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Data platform trinity: Competitive or complementary?

Themes and Conferences per Pacoid, Episode 8

Augmented data management: Data fabric versus data mesh

Data Mesh 101: What it is and Why You Should Care

Why Spreadsheets Are Your Secret Weapon for Efficient Data Governance

Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena

Introducing watsonx: The future of AI for business

Exploring the AI and data capabilities of watsonx

Top 15 data management platforms available today

Top 15 data management platforms

The Modern Data Lakehouse: An Architectural Innovation

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

The Gartner 2021 Leadership Vision for Data & Analytics Leaders Webinar Q&A

What is Data Mapping?

Do the Benefits of Cloud Outweigh the Costs?

Stay Connected