Blog, Data Lake, Data Quality and Data Warehouse

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning.

Data Quality

Data Quality Statistics Data Lake Visualization

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

Webinars

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt. The post What is a Data Mesh?

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Bad data tax is rampant in most organizations. Currently, every organization is blindly chasing the GenAI race, often forgetting that data quality and semantics is one of the fundamentals to achieving AI success. Sadly, data quality is losing to data quantity, resulting in “ Infobesity ”. “Any

Metadata

Metadata Data Lake Data Warehouse Data Quality

Automate large-scale data validation using Amazon EMR and Apache Griffin

AWS Big Data

APRIL 4, 2024

Griffin is an open source data quality solution for big data, which supports both batch and streaming mode. In today’s data-driven landscape, where organizations deal with petabytes of data, the need for automated data validation frameworks has become increasingly critical.

Data Quality

Data Quality Data Lake Data Warehouse Data-driven

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: The first element in the process is the link between the source data and the entry point into the data platform. At Ramsey International (RI), we refer to that layer in the architecture as the foundation, but others call it a staging area, raw zone, or even a source data lake. What is a data fabric?

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

However, a foundational step in evolving into a data-driven organization requires trusted, readily available, and easily accessible data for users within the organization; thus, an effective data governance program is key. Here are a few common data management challenges: Regulatory compliance on data use.

Data-driven

Data-driven Enterprise Data Governance Data Lake

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

This is a guest blog post by Mira Daniels and Sean Whitfield from SumUp. Unless, of course, the rest of their data also resides in the Google Cloud. AWS Glue gave us a cost-efficient option to migrate the data and we further optimized storage cost by pruning cold data. It consists of full-day and intraday tables.

Analytics

Analytics Data Lake Testing Optimization

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Ensuring data quality is made easier as a result.

Metadata

Metadata Data Quality Data-driven Data Governance

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Storage-centric approach In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy.

Metadata

Metadata Data Lake Data-driven Enterprise

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Data mesh solves this by promoting data autonomy, allowing users to make decisions about domains without a centralized gatekeeper. It also improves development velocity with better data governance and access with improved data quality aligned with business needs.

Metadata

Metadata Data-driven Data Quality Data Architecture

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Figure 1 illustrates the typical metadata subjects contained in a data catalog. Figure 1 – Data Catalog Metadata Subjects. Datasets are the files and tables that data workers need to find and access. They may reside in a data lake, warehouse, master data repository, or any other shared data resource.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

Data Management Predictions for 2024: Five Trends

Data Virtualization

MARCH 7, 2024

Reading Time: 3 minutes As we move deeper into 2024, it is imperative for data management leaders to look in their rear-view mirrors to assess and, if needed, refine their data management strategies. One thing is clear; if data-centric organizations want to succeed in.

Management

Management Data Integration Strategy Data Lake

Why Invest Now? Three Investors Share the Story Behind Alation’s Series E

Alation

NOVEMBER 2, 2022

“At Databricks, we’re focused on enabling customers to adopt the data lakehouse, and that’s an open data architecture that combines the best of the data warehouse and the data lake into one platform,” Ferguson says. “[The And data governance is critical to driving adoption.”.

Data Governance

Data Governance Marketing Finance Data Lake

Data Management Predictions for 2024: Five Trends

Data Virtualization

JANUARY 25, 2024

Reading Time: 3 minutes As we head into 2024, it is imperative for data management leaders to look in their rear-view mirrors to assess and, if needed, refine their data management strategies.

Management

Management Data Integration Strategy Data Lake

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

It’s only when companies take their first stab at manually cataloging and documenting operational systems, processes and the associated data, both at rest and in motion, that they realize how time-consuming the entire data prepping and mapping effort is, and why that work is sure to be compounded by human error and data quality issues.

Data Governance

Data Governance Risk Metadata Management

5 Ways Data Engineers Can Support Data Governance

Alation

JANUARY 26, 2023

Offer the right tools Data stewardship is greatly simplified when the right tools are on hand. So ask yourself, does your steward have the software to spot issues with data quality, for example? 2) Always Remember Compliance Source: Unsplash There are now many different data privacy and security laws worldwide.

Data Governance

Data Governance Strategy Data Quality Marketing

Data Swamp, Data Lake, Data Lakehouse: What to Know

Alation

OCTOBER 21, 2021

Data Swamp vs Data Lake. When you imagine a lake, it’s likely an idyllic image of a tree-ringed body of reflective water amid singing birds and dabbling ducks. I’ll take the lake, thank you very much. Many organizations have built a data lake to solve their data storage, access, and utilization challenges.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

As such, most large financial organizations have moved their data to a data lake or a data warehouse to understand and manage financial risk in one place. Yet, the biggest challenge for risk analysis continues to suffer from lack of a scalable way of understanding how data is interrelated.

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

How BMO improved data security with Amazon Redshift and AWS Lake Formation

AWS Big Data

MARCH 1, 2024

One of the bank’s key challenges related to strict cybersecurity requirements is to implement field level encryption for personally identifiable information (PII), Payment Card Industry (PCI), and data that is classified as high privacy risk (HPR). Only users with required permissions are allowed to access data in clear text.

Data Lake

Data Lake Data Warehouse Management Risk

What is an open data lakehouse and why you should care?

IBM Big Data Hub

JANUARY 17, 2023

A data lakehouse is an emerging data management architecture that improves efficiency and converges data warehouse and data lake capabilities driven by a need to improve efficiency and obtain critical insights faster. Let’s start with why data lakehouses are becoming increasingly important.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

For state and local agencies, data silos create compounding problems: Inaccessible or hard-to-access data creates barriers to data-driven decision making. Legacy data sharing involves proliferating copies of data, creating data management, and security challenges. Forrester ). Gartner ).

Data Architecture

Data Architecture Data Lake Metadata Data Warehouse

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

DECEMBER 15, 2022

With in-place table migration, you can rapidly convert to Iceberg tables since there is no need to regenerate data files. Newly generated metadata will then point to source data files as illustrated in the diagram below. . Data quality using table rollback. Read why the future of data lakehouses is open.

Metadata

Metadata Data Warehouse Snapshot Data Quality

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

Big Data technology in today’s world. Did you know that the big data and business analytics market is valued at $198.08 Or that the US economy loses up to $3 trillion per year due to poor data quality? quintillion bytes of data which means an average person generates over 1.5 megabytes of data every second?

Big Data

Big Data Data Analytics Management Unstructured Data

Data Mesh 101: What it is and Why You Should Care

Ontotext

FEBRUARY 12, 2024

It proposes a technological, architectural, and organizational approach to solving data management problems by breaking up the monolithic data platform and de-centralizing data management across different domain teams and services. Some examples of data products are data sets, tables, machine learning models, and APIs.

IT

IT Metadata Data Quality Data Lake

Configure end-to-end data pipelines with Etleap, Amazon Redshift, and dbt

AWS Big Data

JULY 12, 2023

This blog post is co-written with Zygimantas Koncius from Etleap. Organizations use their data to extract valuable insights and drive informed business decisions. Introduction to Amazon Redshift Amazon Redshift is a fast, fully-managed, self-learning, self-tuning, petabyte-scale, ANSI-SQL compatible, and secure cloud data warehouse.

Data Warehouse

Data Warehouse Modeling Dashboards Data Lake

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

Thoughtworks says data mesh is key to moving beyond a monolithic data lake. Spoiler alert: data fabric and data mesh are independent design concepts that are, in fact, quite complementary. Thoughtworks says data mesh is key to moving beyond a monolithic data lake 2. Gartner on Data Fabric.

Data Lake

Data Lake Metadata Data-driven Data Governance

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

The data warehouse and analytical data stores moved to the cloud and disaggregated into the data mesh. Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. Subscribe to Alation's Blog.

Metadata

Metadata Data Warehouse Data Quality Data Lake

What is Business Intelligence Consulting

BizAcuity

APRIL 1, 2023

Thanks to the recent technological innovations and circumstances to their rapid adoption, having a data warehouse has become quite common in various enterprises across sectors. This also applies to businesses that may not have a data warehouse and operate with the help of a backend database system.

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

What is Business Intelligence Consulting

BizAcuity

JANUARY 31, 2023

Thanks to the recent technological innovations and circumstances to their rapid adoption, having a data warehouse has become quite common in various enterprises across sectors. This also applies to businesses that may not have a data warehouse and operate with the help of a backend database system.

Business Intelligence

Business Intelligence Consulting KPI Data Warehouse

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Since its uniquely metadata-driven, the abstraction layer of a data fabric makes it easier to model, integrate and query any data sources, build data pipelines, and integrate data in real-time. This improves data engineering productivity and time-to-value for data consumers. What’s a data mesh?

Management

Management Metadata Data Architecture Data Lake

How OLAP and AI can enable better business

IBM Big Data Hub

DECEMBER 7, 2023

Today, OLAP database systems have become comprehensive and integrated data analytics platforms, addressing the diverse needs of modern businesses. They are seamlessly integrated with cloud-based data warehouses, facilitating the collection, storage and analysis of data from various sources.

OLAP

OLAP Slice and Dice Cost-Benefit Data Warehouse

Use fuzzy string matching to approximate duplicate records in Amazon Redshift

AWS Big Data

FEBRUARY 8, 2023

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift enables you to run complex SQL analytics at scale and performance on terabytes to petabytes of structured and unstructured data, and make the insights widely available through popular business intelligence (BI) and analytics tools.

Data Quality

Data Quality Testing Data Warehouse Unstructured Data

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

analyst Sumit Pal, in “Exploring Lakehouse Architecture and Use Cases,” published January 11, 2022: “Data lakehouses integrate and unify the capabilities of data warehouses and data lakes, aiming to support AI, BI, ML, and data engineering on a single platform.” According to Gartner, Inc.

Metadata

Metadata Machine Learning Unstructured Data Data Lake

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

Businesses face significant hurdles when preparing data for artificial intelligence (AI) applications. The existence of data silos and duplication, alongside apprehensions regarding data quality, presents a multifaceted environment for organizations to manage.

Cost-Benefit

Cost-Benefit Metadata Optimization Management

How data stores and governance impact your AI initiatives

IBM Big Data Hub

OCTOBER 12, 2023

To optimize data analytics and AI workloads, organizations need a data store built on an open data lakehouse architecture. This type of architecture combines the performance and usability of a data warehouse with the flexibility and scalability of a data lake.

Cost-Benefit

Cost-Benefit Metadata Data Governance Modeling

Accelerating model velocity through Snowflake Java UDF integration

Domino Data Lab

JUNE 15, 2021

At a certain point, as the demand keeps growing, the data volumes rapidly increase. Data is no longer stored in CSV files, but in a dedicated, purpose built data lake / data warehouse. In this blog post we’ll focus on the UDF capabilities provided by the two platforms. Why Snowflake UDFs.

Modeling

Modeling Data Science Data-driven Data Warehouse

Strategically Approaching Graph Technologies

Ontotext

FEBRUARY 26, 2024

Firstly, on the data maturity spectrum, the vast majority of organizations I’ve spoken with are stuck in the information stage. They have massive amounts of data they’re collecting and storing in their relational databases, document stores, data lakes, and data warehouses.

Technology

Technology Cost-Benefit Data-driven Metadata

Unlock The Power of Your Data With These 19 Big Data & Data Analytics Books

datapine

AUGUST 29, 2022

The new edition also explores artificial intelligence in more detail, covering topics such as Data Lakes and Data Sharing practices. 6) Lean Analytics: Use Data to Build a Better Startup Faster, by Alistair Croll and Benjamin Yoskovitz.

Big Data

Big Data Data Analytics Analytics Data mining

Did Big Data Deliver Business Transformation & Improved CX?

Alation

AUGUST 4, 2022

And where data was available, the ability to access and interpret it proved problematic. Big data can grow too big fast. Left unchecked, data lakes became data swamps. Some data lake implementations required expensive ‘cleansing pumps’ to make them navigable again. Subscribe to Alation's Blog.

Big Data

Big Data Digital Transformation Data Lake Data-driven

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Ontotext

AUGUST 4, 2023

Thanks to the metadata that the data fabric relies on, companies can also recognize different types of data, what is relevant, and what needs privacy controls; thereby, improving the intelligence of the whole information ecosystem. Data fabric does not replace data warehouses, data lakes, or data lakehouses.

Metadata

Metadata Data-driven Data Architecture Data Quality

Start to Monetize Your Data with Data Marketplaces and Data Value Scoring

erwin

JANUARY 10, 2024

The result is an improvement in quality and availability of data such that you can monetize it. Your goal should be enterprise data management and an analytics function that pays for itself, like a self-funding data warehouse, data lake or data mesh. What is data monetization?

Measurement

Measurement Cost-Benefit Data Governance Management

AWS Glue Data Quality is Generally Available

Data architecture strategy for data quality

Webinars

Trending Sources

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Webinars

What is a Data Mesh?

How Knowledge Graphs Power Data Mesh and Data Fabric

Automate large-scale data validation using Amazon EMR and Apache Griffin

Demystifying Modern Data Platforms

Overcome these six data consumption challenges for a more data-driven enterprise

How SumUp made digital analytics more accessible using AWS Glue

Five benefits of a data catalog

Case study: Policy Enforcement Automation With Semantics

What is Data Mesh?

What Is a Data Catalog?

Data Management Predictions for 2024: Five Trends

Why Invest Now? Three Investors Share the Story Behind Alation’s Series E

Data Management Predictions for 2024: Five Trends

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

5 Ways Data Engineers Can Support Data Governance

Data Swamp, Data Lake, Data Lakehouse: What to Know

Data Strategies for Getting Greater Business Value from Distributed Data

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

How BMO improved data security with Amazon Redshift and AWS Lake Formation

What is an open data lakehouse and why you should care?

Breaking State and Local Data Silos with Modern Data Architectures

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

How Data Management and Big Data Analytics Speed Up Business Growth

Data Mesh 101: What it is and Why You Should Care

Configure end-to-end data pipelines with Etleap, Amazon Redshift, and dbt

Data Mesh vs. Data Fabric: A Love Story

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

What is Business Intelligence Consulting

What is Business Intelligence Consulting

Augmented data management: Data fabric versus data mesh

How OLAP and AI can enable better business

­­Use fuzzy string matching to approximate duplicate records in Amazon Redshift

The Modern Data Lakehouse: An Architectural Innovation

Tackling AI’s data challenges with IBM databases on AWS

How data stores and governance impact your AI initiatives

Accelerating model velocity through Snowflake Java UDF integration

Strategically Approaching Graph Technologies

Unlock The Power of Your Data With These 19 Big Data & Data Analytics Books

Did Big Data Deliver Business Transformation & Improved CX?

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Start to Monetize Your Data with Data Marketplaces and Data Value Scoring

Stay Connected

Use fuzzy string matching to approximate duplicate records in Amazon Redshift