Data Lake, Data Warehouse, Metadata and Reporting

Data Lake

Data Warehouse

Metadata

Reporting

Understanding the Differences Between Data Lakes and Data Warehouses

Smart Data Collective

AUGUST 28, 2021

Data lakes and data warehouses are probably the two most widely used structures for storing data. Data Warehouses and Data Lakes in a Nutshell. A data warehouse is used as a central storage space for large amounts of structured data coming from various sources.

Data Lake

Data Lake Data Warehouse Unstructured Data Structured Data

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

AWS Big Data

MARCH 10, 2023

Since the deluge of big data over a decade ago, many organizations have learned to build applications to process and analyze petabytes of data. Data lakes have served as a central repository to store structured and unstructured data at any scale and in various formats.

Data Lake

Data Lake Sales Data Warehouse Snapshot

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

AWS Big Data

MARCH 27, 2023

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Tags allows you to assign metadata to your AWS resources. In AWS Cost Explorer , you want to create cost reports for Redshift Serverless by department, environment, and cost center.

Data Warehouse

Data Warehouse Management Snapshot Data Lake

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale.

Data Lake

Data Lake Metadata Optimization Statistics

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

AWS Big Data

FEBRUARY 22, 2023

In this post, we show how Ruparupa implemented an incrementally updated data lake to get insights into their business using Amazon Simple Storage Service (Amazon S3), AWS Glue , Apache Hudi , and Amazon QuickSight. An AWS Glue ETL job, using the Apache Hudi connector, updates the S3 data lake hourly with incremental data.

Data Lake

Data Lake Dashboards Cost-Benefit Metadata

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. The results demonstrate superior price performance of Cloudera Data Warehouse on the full set of 99 queries from the TPC-DS benchmark. Introduction.

Data Warehouse

Data Warehouse Cost-Benefit Consulting Interactive

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

In most enterprises data teams lack a data map and data asset inventory and are often unaware of data that exists across the organization, its associated profile, quality and associated metadata. Teams can’t access data to build their business use cases. For example, a product data tag is basic metadata.

Metadata

Metadata Data Lake Data Warehouse Data Quality

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. For InitialRunFlag , choose Setup.

Data Lake

Data Lake Data Transformation Cost-Benefit Data-driven

Data Lakes: What Are They and Who Needs Them?

Jet Global

JULY 2, 2019

The sheer scale of data being captured by the modern enterprise has necessitated a monumental shift in how that data is stored. From the humble database through to data warehouses , data stores have grown both in scale and complexity to keep pace with the businesses they serve, and the data analysis now required to remain competitive.

Data Lake

Data Lake Data Warehouse Big Data Machine Learning

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

Gartner® recognized Cloudera in three recent reports – Magic Quadrant for Cloud Database Management Systems (DBMS), Critical Capabilities for Cloud Database Management Systems for Analytical Use Cases and Critical Capabilities for Cloud Database Management Systems for Operational Use Cases. Download the reports to see the detailed scores .

Reporting

Reporting Data Warehouse Data Lake Machine Learning

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

AWS Big Data

NOVEMBER 29, 2023

In 2013, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift , the first fully-managed, petabyte-scale, enterprise-grade cloud data warehouse. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools.

Data Warehouse

Data Warehouse Data Lake Analytics Machine Learning

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

The data products used inside the company include insights from user journeys, operational reports, and marketing campaign results, among others. The data platform serves on average 60 thousand queries per day. The data volume is in double-digit TBs with steady growth as business and data sources evolve.

Data Lake

Data Lake Data Warehouse B2B Data-driven

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt.

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architects are frequently part of a data science team and tasked with leading data system projects. They often report to data infrastructure and data science leads. Application data architect: The application data architect designs and implements data models for specific software applications.

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Storage-centric approach In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy. Another complication was the existing data silos.

Metadata

Metadata Data Lake Data-driven Enterprise

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

AWS Big Data

NOVEMBER 16, 2023

Amazon Redshift is a popular cloud data warehouse, offering a fully managed cloud-based service that seamlessly integrates with an organization’s Amazon Simple Storage Service (Amazon S3) data lake, real-time streams, machine learning (ML) workflows, transactional workflows, and much more—all while providing up to 7.9x

Enterprise

Enterprise Data Warehouse Data Lake Optimization

How BMO improved data security with Amazon Redshift and AWS Lake Formation

AWS Big Data

MARCH 1, 2024

One of the bank’s key challenges related to strict cybersecurity requirements is to implement field level encryption for personally identifiable information (PII), Payment Card Industry (PCI), and data that is classified as high privacy risk (HPR). Only users with required permissions are allowed to access data in clear text.

Data Lake

Data Lake Data Warehouse Management Risk

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

AWS Big Data

MARCH 7, 2023

A data hub contains data at multiple levels of granularity and is often not integrated. It differs from a data lake by offering data that is pre-validated and standardized, allowing for simpler consumption by users. Data hubs and data lakes can coexist in an organization, complementing each other.

Analytics

Analytics Data Warehouse Data Lake Metadata

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

AWS Big Data

JANUARY 24, 2023

This solution only replicates metadata in the Data Catalog, not the actual underlying data. To have a redundant data lake using Lake Formation and AWS Glue in an additional Region, we recommend replicating the Amazon S3-based storage using S3 replication , S3 sync, aws-s3-copy-sync-using-batch or S3 Batch replication process.

Data Architecture

Data Architecture Metadata Data Lake Snapshot

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

Without C360, businesses face missed opportunities, inaccurate reports, and disjointed customer experiences, leading to customer churn. Profile aggregation – When you’ve uniquely identified a customer, you can build applications in Managed Service for Apache Flink to consolidate all their metadata, from name to interaction history.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Where Do Data Catalogs Fit in Metadata Management?

Alation

FEBRUARY 13, 2020

In an earlier blog, I defined a data catalog as “a collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need, serves as an inventory of available data, and provides information to evaluate fitness data for intended uses.”.

Metadata

Metadata Management Data Lake Data Governance

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

A hybrid approach in healthcare data warehousing with Amazon Redshift

AWS Big Data

FEBRUARY 21, 2023

Data warehouses play a vital role in healthcare decision-making and serve as a repository of historical data. A healthcare data warehouse can be a single source of truth for clinical quality control systems. This is one of the biggest hurdles with the data vault approach. What is a dimensional data model?

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Modeling

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

An Amazon DataZone domain contains an associated business data catalog for search and discovery, a set of metadata definitions to decorate the data assets that are used for discovery purposes, and data projects with integrated analytics and ML tools for users and groups to consume and publish data assets.

Metadata

Metadata Data Lake Publishing Data Governance

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

It includes processes that trace and document the origin of data, models and associated metadata and pipelines for audits. How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools.

Risk

Risk Modeling Management Metadata

Handling Data Lineage in Snowflake for BI Success

Octopai

MARCH 22, 2021

Elimination of data silos, consistency, accessibility, reliability, scalability, power of the cloud…. Snowflake is still exploding with data, and dealing with the ever-familiar issues like reporting errors, impact analysis, finding where PII is located and more can still cause your BI team to, well…. Cleaning up dirty data.

Data Lake

Data Lake Metadata Data Warehouse Reporting

What Is Data Curation?

Alation

FEBRUARY 13, 2020

Data curation is important in today’s world of data sharing and self-service analytics, but I think it is a frequently misused term. When speaking and consulting, I often hear people refer to data in their data lakes and data warehouses as curated data, believing that it is curated because it is stored as shareable data.

Metadata

Metadata Data Warehouse Data Lake Data Governance

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

The right data architecture can help your organization improve data quality because it provides the framework that determines how data is collected, transported, stored, secured, used and shared for business intelligence and data science use cases. Perform data quality monitoring based on pre-configured rules.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

AWS Big Data

MARCH 29, 2024

QuickSight makes it straightforward for business users to visualize data in interactive dashboards and reports. You can slice data by different dimensions like job name, see anomalies, and share reports securely across your organization. QuickSight lets you perform aggregate calculations on metrics for deeper analysis.

Metrics

Metrics Visualization Dashboards Interactive

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

At these times, they run business growth reports, shareholder reports, and financial reports for their earnings calls, to name a few examples. Cloud deployments for suitable workloads gives you the agility to keep pace with rapidly changing business and data needs.

Data Warehouse

Data Warehouse Reporting Risk Cost-Benefit

The Madness of Data (and analytics) Governance

Andrew White

DECEMBER 9, 2019

The outline of the call went as follows: I was taking to a central state agency who was organizing a data governance initiative (in their words) across three other state agencies. All four agencies had reported an independent but identical experience with data governance in the past. An enterprise-wide data governance board.

Analytics

Analytics Data Lake Data Governance Metadata

Automate legacy ETL conversion to AWS Glue using Cognizant Data and Intelligence Toolkit (CDIT) – ETL Conversion Tool

AWS Big Data

OCTOBER 4, 2023

Migrating on-premises data warehouses to the cloud is no longer viewed as an option but a necessity for companies to save cost and take advantage of what the latest technology has to offer. This blog post is co-written with Govind Mohan and Kausik Dhar from Cognizant.

Data Warehouse

Data Warehouse Cost-Benefit Metadata Data Lake

Integrating Data Governance and Enterprise Architecture

erwin

SEPTEMBER 3, 2020

To better understand and align data governance and enterprise architecture, let’s look at data at rest and data in motion and why they both have to be documented. Documenting data at rest involves looking at where data is stored, such as in databases, data lakes , data warehouses and flat files.

Data Governance

Data Governance Enterprise Risk Data Lake

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

AWS Big Data

APRIL 28, 2023

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools.

Interactive

Interactive Metadata Data Warehouse Data-driven

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog

AWS Big Data

APRIL 8, 2024

Prior to this integration, you had to complete the following steps before Amazon DataZone could treat the published Data Catalog table as a managed asset: Identity the Amazon S3 location associated with Data Catalog table. Publish the table metadata to the Amazon DataZone business data catalog.

Finance

Finance Sales Publishing Metadata

Convergent Evolution

Peter James Thomas

AUGUST 18, 2018

That was the Science, here comes the Technology… A Brief Hydrology of Data Lakes. Overlapping with the above, from around 2012, I began to get involved in also designing and implementing Big Data Architectures; initially for narrow purposes and later Data Lakes spanning entire enterprises. In Closing.

Data Lake

Data Lake Data Warehouse Data mining Statistics

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

A typical organization’s data landscape consists of a large number of data stores across workflows, business processes and business units, including but not limited to data warehouses, data marts, data lakes, ODS, cloud data stores, and CRM databases. The volume of data assets.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Technical metadata to describe schemas, indexes and other database objects.

Metadata

Metadata Data Quality Data-driven Data Governance

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

CIO Business Intelligence

SEPTEMBER 6, 2022

When Steve Pimblett joined The Very Group in October 2020 as chief data officer, reporting to the conglomerate’s CIO, his task was to help the enterprise uncover value in its rich data heritage. As a result, Pimblett now runs the organization’s data warehouse, analytics, and business intelligence.

IT Forecasting Data Lake Enterprise

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

Unless, of course, the rest of their data also resides in the Google Cloud. In this post we showcase how we used AWS Glue to move siloed digital analytics data, with inconsistent arrival times, to AWS S3 (our Data Lake) and our central data warehouse (DWH), Snowflake.

Analytics

Analytics Data Lake Testing Optimization

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

Organizations have spent a lot of time and money trying to harmonize data across diverse platforms , including cleansing, uploading metadata, converting code, defining business glossaries, tracking data transformations and so on. Creating a High-Quality Data Pipeline.

Data Governance

Data Governance Risk Metadata Management

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

In light of a year of unprecedented disruptions, where data has never been so important, and to reflect on the rapidly advancing world of data-led digital transformation, we are excited to announce this year’s 7 categories: DATA LIFECYCLE CONNECTION. DATA FOR GOOD. SECURITY AND GOVERNANCE LEADERSHIP.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Of Muffins and Machine Learning Models

Cloudera

FEBRUARY 16, 2022

In the case of CDP Public Cloud, this includes virtual networking constructs and the data lake as provided by a combination of a Cloudera Shared Data Experience (SDX) and the underlying cloud storage. Each project consists of a declarative series of steps or operations that define the data science workflow.

Machine Learning

Machine Learning Modeling Metadata Recreation/Entertainment

Data Swamp, Data Lake, Data Lakehouse: What to Know

Alation

OCTOBER 21, 2021

Data Swamp vs Data Lake. When you imagine a lake, it’s likely an idyllic image of a tree-ringed body of reflective water amid singing birds and dabbling ducks. I’ll take the lake, thank you very much. Many organizations have built a data lake to solve their data storage, access, and utilization challenges.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Understanding the Differences Between Data Lakes and Data Warehouses

Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena

Webinars

Trending Sources

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Webinars

Choosing an open table format for your transactional data lake on AWS

How Ruparupa gained updated insights with an Amazon S3 data lake, AWS Glue, Apache Hudi, and Amazon QuickSight

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

How Knowledge Graphs Power Data Mesh and Data Fabric

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Data Lakes: What Are They and Who Needs Them?

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Amazon Redshift announcements at AWS re:Invent 2023 to enable analytics on all your data

How smava makes loans transparent and affordable using Amazon Redshift Serverless

What is a Data Mesh?

What is a data architect? Skills, salaries, and how to become a data framework master

Case study: Policy Enforcement Automation With Semantics

Power enterprise-grade Data Vaults with Amazon Redshift – Part 1

How BMO improved data security with Amazon Redshift and AWS Lake Formation

How gaming companies can use Amazon Redshift Serverless to build scalable analytical applications faster and easier

Build a multi-Region and highly resilient modern data architecture using AWS Glue and AWS Lake Formation

Create an end-to-end data strategy for Customer 360 on AWS

Where Do Data Catalogs Fit in Metadata Management?

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

A hybrid approach in healthcare data warehousing with Amazon Redshift

Unlock data across organizational boundaries using Amazon DataZone – now generally available

How to use foundation models and trusted governance to manage AI workflow risk

Handling Data Lineage in Snowflake for BI Success

What Is Data Curation?

Data architecture strategy for data quality

Enhance monitoring and debugging for AWS Glue jobs using new job observability metrics, Part 3: Visualization and trend analysis using Amazon QuickSight

Extreme data center pressure? Burst to the cloud with CDP!

The Madness of Data (and analytics) Governance

Automate legacy ETL conversion to AWS Glue using Cognizant Data and Intelligence Toolkit (CDIT) – ETL Conversion Tool

Integrating Data Governance and Enterprise Architecture

Use the Amazon Redshift Data API to interact with Amazon Redshift Serverless

Amazon DataZone announces integration with AWS Lake Formation hybrid access mode for the AWS Glue Data Catalog

Convergent Evolution

Overcome these six data consumption challenges for a more data-driven enterprise

Five benefits of a data catalog

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

How SumUp made digital analytics more accessible using AWS Glue

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Announcing the 2021 Data Impact Awards

Of Muffins and Machine Learning Models

Data Swamp, Data Lake, Data Lakehouse: What to Know

Stay Connected