Blog, Data Lake and Data Quality

Blog

Data Lake

Data Quality

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Big Data

OCTOBER 10, 2023

Data governance is the process of ensuring the integrity, availability, usability, and security of an organization’s data. Due to the volume, velocity, and variety of data being ingested in data lakes, it can get challenging to develop and maintain policies and procedures to ensure data governance at scale for your data lake.

Data Quality

Data Quality Data Governance Data Lake Testing

AWS Glue Data Quality is Generally Available

AWS Big Data

JUNE 6, 2023

We are excited to announce the General Availability of AWS Glue Data Quality. Our journey started by working backward from our customers who create, manage, and operate data lakes and data warehouses for analytics and machine learning. It takes days for data engineers to identify and implement data quality rules.

Data Quality

Data Quality Statistics Data Lake Visualization

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Trending Sources

Data architecture strategy for data quality

IBM Big Data Hub

JANUARY 5, 2023

Poor data quality is one of the top barriers faced by organizations aspiring to be more data-driven. Ill-timed business decisions and misinformed business processes, missed revenue opportunities, failed business initiatives and complex data systems can all stem from data quality issues.

Data Quality

Data Quality Data Architecture Strategy Data Lake

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

MORE WEBINARS

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Poor-quality data can lead to incorrect insights, bad decisions, and lost opportunities. AWS Glue Data Quality measures and monitors the quality of your dataset. It supports both data quality at rest and data quality in AWS Glue extract, transform, and load (ETL) pipelines.

Data Quality

Data Quality Data Lake Visualization Data-driven

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

Observability in DataOps refers to the ability to monitor and understand the performance and behavior of data-related systems and processes, and to use that information to improve the quality and speed of data-driven decision making. Query> An AI, Chat GPT wrote this blog post, why should I read it? .

Machine Learning

Machine Learning Data-driven Optimization Modeling

Data Lakes on Cloud & it’s Usage in Healthcare

BizAcuity

MARCH 29, 2019

Data lakes are centralized repositories that can store all structured and unstructured data at any desired scale. The power of the data lake lies in the fact that it often is a cost-effective way to store data. The power of the data lake lies in the fact that it often is a cost-effective way to store data.

Data Lake

Data Lake Unstructured Data Cost-Benefit Data Quality

Navigating the Chaos of Unruly Data: Solutions for Data Teams

DataKitchen

NOVEMBER 10, 2023

The core issue plaguing many organizations is the presence of out-of-control databases or data lakes characterized by: Unrestrained Data Changes: Numerous users and tools incessantly alter data, leading to a tumultuous environment. Monitor freshness, schema changes, volume, and column health are standard.

Data Quality

Data Quality Testing Data Lake Data Integration

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

Bad data tax is rampant in most organizations. Currently, every organization is blindly chasing the GenAI race, often forgetting that data quality and semantics is one of the fundamentals to achieving AI success. Sadly, data quality is losing to data quantity, resulting in “ Infobesity ”. “Any

Metadata

Metadata Data Lake Data Warehouse Data Quality

Automate large-scale data validation using Amazon EMR and Apache Griffin

AWS Big Data

APRIL 4, 2024

Griffin is an open source data quality solution for big data, which supports both batch and streaming mode. In today’s data-driven landscape, where organizations deal with petabytes of data, the need for automated data validation frameworks has become increasingly critical.

Data Quality

Data Quality Data Lake Data Warehouse Data-driven

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt. Second-generation – gigantic, complex data lake maintained by a specialized team drowning in technical debt. The post What is a Data Mesh?

Data Architecture

Data Architecture Data Lake Cost-Benefit Data Warehouse

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

Ensuring that data is available, secure, correct, and fit for purpose is neither simple nor cheap. Companies end up paying outside consultants enormous fees while still having to suffer the effects of poor data quality and lengthy cycle time. . For example, DataOps can be used to automate data integration.

Consulting

Consulting Testing Data Lake Data Quality

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

AWS Big Data

JANUARY 30, 2023

Many customers need an ACID transaction (atomic, consistent, isolated, durable) data lake that can log change data capture (CDC) from operational data sources. There is also demand for merging real-time data into batch data. Delta Lake framework provides these two capabilities. Choose Create role.

Insurance

Insurance Data Lake Data-driven Management

How SumUp made digital analytics more accessible using AWS Glue

AWS Big Data

JUNE 6, 2023

This is a guest blog post by Mira Daniels and Sean Whitfield from SumUp. Unless, of course, the rest of their data also resides in the Google Cloud. AWS Glue gave us a cost-efficient option to migrate the data and we further optimized storage cost by pruning cold data. It consists of full-day and intraday tables.

Analytics

Analytics Data Lake Testing Optimization

Data Mesh 101: How Data Mesh Can Be Used in an Organization

Ontotext

FEBRUARY 12, 2024

Domain teams should continually monitor for data errors with data validation checks and incorporate data lineage to track usage. Establish and enforce data governance by ensuring all data used is accurate, complete, and compliant with regulations. For instance, JPMorgan Chase & Co.

Data Quality

Data Quality Data-driven Data Lake Data Governance

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

IBM Big Data Hub

MAY 9, 2023

Data: the foundation of your foundation model Data quality matters. An AI model trained on biased or toxic data will naturally tend to produce biased or toxic outputs. When objectionable data is identified, we remove it, retrain the model, and repeat. Data curation is a task that’s never truly finished.

Enterprise

Enterprise Technology Modeling Cost-Benefit

You Can’t Hit What You Can’t See

Cloudera

DECEMBER 1, 2022

Data observability provides insight into the condition and evolution of the data resources from source through the delivery of the data products. Barr Moses of Monte Carlo presents it as a combination of data flow, data quality, data governance, and data lineage. Source: IDC .

Metrics

Metrics Data Quality Data Lake Statistics

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: The first element in the process is the link between the source data and the entry point into the data platform. At Ramsey International (RI), we refer to that layer in the architecture as the foundation, but others call it a staging area, raw zone, or even a source data lake.

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Overcome these six data consumption challenges for a more data-driven enterprise

IBM Big Data Hub

JUNE 8, 2022

However, a foundational step in evolving into a data-driven organization requires trusted, readily available, and easily accessible data for users within the organization; thus, an effective data governance program is key. Here are a few common data management challenges: Regulatory compliance on data use.

Data-driven

Data-driven Enterprise Data Governance Data Lake

Data Profiling: What It Is and How to Perfect It

Alation

APRIL 18, 2023

For any data user in an enterprise today, data profiling is a key tool for resolving data quality issues and building new data solutions. In this blog, we’ll cover the definition of data profiling, top use cases, and share important techniques and best practices for data profiling today.

IT Data Quality Metadata Data Governance

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance. It uses metadata and data management tools to organize all data assets within your organization. Ensuring data quality is made easier as a result.

Metadata

Metadata Data Quality Data-driven Data Governance

Case study: Policy Enforcement Automation With Semantics

Ontotext

MAY 2, 2024

Storage-centric approach In the storage-centric approach, people try to address data silos by throwing everything in a data lake or a data warehouse. But, although, this helps somewhat in terms of architecture, soon these data lakes become unwieldy.

Metadata

Metadata Data Lake Data-driven Enterprise

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Figure 1 illustrates the typical metadata subjects contained in a data catalog. Figure 1 – Data Catalog Metadata Subjects. Datasets are the files and tables that data workers need to find and access. They may reside in a data lake, warehouse, master data repository, or any other shared data resource.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

What is Data Mesh?

Ontotext

NOVEMBER 16, 2023

Data mesh solves this by promoting data autonomy, allowing users to make decisions about domains without a centralized gatekeeper. It also improves development velocity with better data governance and access with improved data quality aligned with business needs.

Metadata

Metadata Data-driven Data Quality Data Architecture

DataOps Observability: Taming the Chaos (Part 3)

DataKitchen

NOVEMBER 18, 2022

In addition to the tracking of relationships and quality metrics, DataOps Observability journeys allow users to establish baselines?concrete concrete expectations for run schedules, run durations, data quality, and upstream and downstream dependencies. And she’ll know when newer data will arrive.

Testing

Testing Statistics Measurement Metrics

Data Management Predictions for 2024: Five Trends

Data Virtualization

MARCH 7, 2024

Reading Time: 3 minutes As we move deeper into 2024, it is imperative for data management leaders to look in their rear-view mirrors to assess and, if needed, refine their data management strategies. One thing is clear; if data-centric organizations want to succeed in.

Management

Management Data Integration Strategy Data Lake

Data Management Predictions for 2024: Five Trends

Data Virtualization

JANUARY 25, 2024

Reading Time: 3 minutes As we head into 2024, it is imperative for data management leaders to look in their rear-view mirrors to assess and, if needed, refine their data management strategies.

Management

Management Data Integration Strategy Data Lake

Turnkey Cloud DataOps: Solution from Alation and Accenture

Alation

MARCH 22, 2022

As the latest iteration in this pursuit of high-quality data sharing, DataOps combines a range of disciplines. It synthesizes all we’ve learned about agile, data quality , and ETL/ELT. This produces end-to-end lineage so business and technology users alike can understand the state of a data lake and/or lake house.

Metadata

Metadata Cost-Benefit Data Quality Data Lake

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

It’s impossible for data teams to assure the data quality of such spreadsheets and govern them all effectively. If unaddressed, this chaos can lead to data quality, compliance, and security issues. Read the overview blog: Alation Connected Sheets Brings Trust to Spreadsheets. Curious to learn more?

Metadata

Metadata Enterprise Cost-Benefit Finance

Data Swamp, Data Lake, Data Lakehouse: What to Know

Alation

OCTOBER 21, 2021

Data Swamp vs Data Lake. When you imagine a lake, it’s likely an idyllic image of a tree-ringed body of reflective water amid singing birds and dabbling ducks. I’ll take the lake, thank you very much. Many organizations have built a data lake to solve their data storage, access, and utilization challenges.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Why Invest Now? Three Investors Share the Story Behind Alation’s Series E

Alation

NOVEMBER 2, 2022

“At Databricks, we’re focused on enabling customers to adopt the data lakehouse, and that’s an open data architecture that combines the best of the data warehouse and the data lake into one platform,” Ferguson says. “[The And data governance is critical to driving adoption.”.

Data Governance

Data Governance Marketing Finance Data Lake

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Optimization

5 Ways Data Engineers Can Support Data Governance

Alation

JANUARY 26, 2023

Offer the right tools Data stewardship is greatly simplified when the right tools are on hand. So ask yourself, does your steward have the software to spot issues with data quality, for example? 2) Always Remember Compliance Source: Unsplash There are now many different data privacy and security laws worldwide.

Data Governance

Data Governance Strategy Data Quality Marketing

Data Strategies for Getting Greater Business Value from Distributed Data

Data Virtualization

MAY 19, 2023

Reading Time: 11 minutes The post Data Strategies for Getting Greater Business Value from Distributed Data appeared first on Data Management Blog - Data Integration and Modern Data Management Articles, Analysis and Information.

Data Strategy

Data Strategy Strategy Data Integration Management

The Role of the Data Catalog in Data Security

Alation

JUNE 14, 2021

According to a 2020 451 Research report , “data catalogs are rapidly building out automated functionality,” including “automated suggestions, automated discovery and tagging, and automated data-quality scoring.” These are essential to enabling a more rapid process of sensitive data discovery. Subscribe to Alation's Blog.

Data Governance

Data Governance Recreation/Entertainment Data Lake Digital Transformation

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

erwin

JANUARY 11, 2019

It’s only when companies take their first stab at manually cataloging and documenting operational systems, processes and the associated data, both at rest and in motion, that they realize how time-consuming the entire data prepping and mapping effort is, and why that work is sure to be compounded by human error and data quality issues.

Data Governance

Data Governance Risk Metadata Management

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

Ontotext

MARCH 8, 2023

As such, most large financial organizations have moved their data to a data lake or a data warehouse to understand and manage financial risk in one place. Yet, the biggest challenge for risk analysis continues to suffer from lack of a scalable way of understanding how data is interrelated.

Enterprise

Enterprise Knowledge Discovery Risk Data-driven

How BMO improved data security with Amazon Redshift and AWS Lake Formation

AWS Big Data

MARCH 1, 2024

One of the bank’s key challenges related to strict cybersecurity requirements is to implement field level encryption for personally identifiable information (PII), Payment Card Industry (PCI), and data that is classified as high privacy risk (HPR). Only users with required permissions are allowed to access data in clear text.

Data Lake

Data Lake Data Warehouse Management Risk

What is an open data lakehouse and why you should care?

IBM Big Data Hub

JANUARY 17, 2023

A data lakehouse is an emerging data management architecture that improves efficiency and converges data warehouse and data lake capabilities driven by a need to improve efficiency and obtain critical insights faster. Let’s start with why data lakehouses are becoming increasingly important.

Data Lake

Data Lake Metadata Data Warehouse Data Governance

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

Among the most common challenges to achieving AI adoption at scale were data quality and availability (36%), scalability and deployment (36%), integration with existing systems and processes (35%), and change management and organizational culture (34%).

Data Architecture

Data Architecture Strategy Data Lake Data-driven

How Data Management and Big Data Analytics Speed Up Business Growth

BizAcuity

APRIL 14, 2022

Big Data technology in today’s world. Did you know that the big data and business analytics market is valued at $198.08 Or that the US economy loses up to $3 trillion per year due to poor data quality? quintillion bytes of data which means an average person generates over 1.5 megabytes of data every second?

Big Data

Big Data Data Analytics Management Unstructured Data

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

For state and local agencies, data silos create compounding problems: Inaccessible or hard-to-access data creates barriers to data-driven decision making. Legacy data sharing involves proliferating copies of data, creating data management, and security challenges. Forrester ). Gartner ).

Data Architecture

Data Architecture Data Lake Metadata Data Warehouse

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

Thoughtworks says data mesh is key to moving beyond a monolithic data lake. Spoiler alert: data fabric and data mesh are independent design concepts that are, in fact, quite complementary. Thoughtworks says data mesh is key to moving beyond a monolithic data lake 2. Gartner on Data Fabric.

Data Lake

Data Lake Metadata Data-driven Data Governance

Data Mesh 101: What it is and Why You Should Care

Ontotext

FEBRUARY 12, 2024

It proposes a technological, architectural, and organizational approach to solving data management problems by breaking up the monolithic data platform and de-centralizing data management across different domain teams and services. Some examples of data products are data sets, tables, machine learning models, and APIs.

IT Metadata Data Quality Data Lake

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

DECEMBER 15, 2022

With in-place table migration, you can rapidly convert to Iceberg tables since there is no need to regenerate data files. Newly generated metadata will then point to source data files as illustrated in the diagram below. . Data quality using table rollback. Only metadata will be regenerated.

Metadata

Metadata Data Warehouse Snapshot Data Quality

Automated data governance with AWS Glue Data Quality, sensitive data detection, and AWS Lake Formation

AWS Glue Data Quality is Generally Available

Webinars

Trending Sources

Data architecture strategy for data quality

Webinars

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

An AI Chat Bot Wrote This Blog Post …

Data Lakes on Cloud & it’s Usage in Healthcare

Navigating the Chaos of Unruly Data: Solutions for Data Teams

How Knowledge Graphs Power Data Mesh and Data Fabric

Automate large-scale data validation using Amazon EMR and Apache Griffin

What is a Data Mesh?

Fire Your Super-Smart Data Consultants with DataOps

Handle UPSERT data operations using open-source Delta Lake and AWS Glue

How SumUp made digital analytics more accessible using AWS Glue

Data Mesh 101: How Data Mesh Can Be Used in an Organization

Introducing the technology behind watsonx.ai, IBM’s AI and data platform for enterprise

You Can’t Hit What You Can’t See

Demystifying Modern Data Platforms

Overcome these six data consumption challenges for a more data-driven enterprise

Data Profiling: What It Is and How to Perfect It

Five benefits of a data catalog

Case study: Policy Enforcement Automation With Semantics

What Is a Data Catalog?

What is Data Mesh?

DataOps Observability: Taming the Chaos (Part 3)

Data Management Predictions for 2024: Five Trends

Data Management Predictions for 2024: Five Trends

Turnkey Cloud DataOps: Solution from Alation and Accenture

What Is Alation Connected Sheets? Q&A with the Creators

Data Swamp, Data Lake, Data Lakehouse: What to Know

Why Invest Now? Three Investors Share the Story Behind Alation’s Series E

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

5 Ways Data Engineers Can Support Data Governance

Data Strategies for Getting Greater Business Value from Distributed Data

The Role of the Data Catalog in Data Security

Data Preparation and Data Mapping: The Glue Between Data Management and Data Governance to Accelerate Insights and Reduce Risks

Top Graph Use Cases and Enterprise Applications (with Real World Examples)

How BMO improved data security with Amazon Redshift and AWS Lake Formation

What is an open data lakehouse and why you should care?

Data Architecture and Strategy in the AI Era

How Data Management and Big Data Analytics Speed Up Business Growth

Breaking State and Local Data Silos with Modern Data Architectures

Data Mesh vs. Data Fabric: A Love Story

Data Mesh 101: What it is and Why You Should Care

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Stay Connected