Data Leaders Brief

Measure performance of AWS Glue Data Quality for ETL pipelines

AWS Big Data

MARCH 12, 2024

In recent years, data lakes have become a mainstream architecture, and data quality validation is a critical factor to improve the reusability and consistency of the data. In this post, we provide benchmark results of running increasingly complex data quality rulesets over a predefined test dataset.

Data Quality

Data Quality Measurement Testing Visualization

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Cloudera

MARCH 2, 2023

Recently, we announced enhanced multi-function analytics support in Cloudera Data Platform (CDP) with Apache Iceberg. Iceberg is a high-performance open table format for huge analytic data sets. This enables you to maximize utilization of streaming data at scale. The Catalog Type should be set to Hive.

Snapshot

Snapshot Data Processing Metadata Management

Do I Need a Data Catalog?

erwin

JUNE 26, 2020

If you’re serious about a data-driven strategy , you’re going to need a data catalog. Organizations need a data catalog because it enables them to create a seamless way for employees to access and consume data and business assets in an organized manner. Three Types of Metadata in a Data Catalog.

Metadata

Metadata Cost-Benefit Measurement Data-driven

Webinars

The Key to Sustainable Energy Optimization: A Data-Driven Approach for Manufacturing

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You Need to Know

How To Get Promoted In Product Management

MORE WEBINARS

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

AWS Big Data

APRIL 3, 2024

Today, we are pleased to announce that Amazon DataZone is now able to present data quality information for data assets. Other organizations monitor the quality of their data through third-party solutions. Amazon DataZone now integrates directly with AWS Glue to display data quality scores for AWS Glue Data Catalog assets.

Data Quality

Data Quality Visualization Metadata Metrics

How to Evaluate a Data Catalog

More data, more problems. Do you struggle to find, understand, and trust data in your daily work? A data catalog will make your work life easier -- and more productive. This guide offers handy tips for evaluating data catalogs. But where do you start?

Enhance query performance using AWS Glue Data Catalog column-level statistics

AWS Big Data

NOVEMBER 22, 2023

Today, we’re making available a new capability of AWS Glue Data Catalog that allows generating column-level statistics for AWS Glue tables. Data lakes are designed for storing vast amounts of raw, unstructured, or semi-structured data at a low cost, and organizations share those datasets across multiple departments and teams.

Statistics

Statistics Data Lake Optimization Data-driven

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

AWS Big Data

MARCH 7, 2024

Organizations often need to manage a high volume of data that is growing at an extraordinary rate. At the same time, they need to optimize operational costs to unlock the value of this data for timely insights and do so with a consistent performance. We think of this concept as inside-out data movement. Example Corp.

Data Lake

Data Lake Analytics Dashboards Metrics

Data governance in the age of generative AI

AWS Big Data

FEBRUARY 29, 2024

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Data governance is a critical building block across all these approaches, and we see two emerging areas of focus.

Data Governance

Data Governance Unstructured Data Metadata Data Lake

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Data is the lifeblood of modern businesses. In today’s data-driven world, companies rely on data to make informed decisions, gain a competitive edge, and provide exceptional customer experiences. However, not all data is created equal. AWS Glue Data Quality measures and monitors the quality of your dataset.

Data Quality

Data Quality Data Lake Visualization Data-driven

Announcing the AWS Well-Architected Data Analytics Lens

AWS Big Data

MARCH 26, 2024

We are delighted to announce the release of the Data Analytics Lens. The lens consists of a lens whitepaper and an AWS-created lens available in the Lens Catalog of the AWS Well-Architected Tool. The AWS Well-Architected Framework provides a consistent approach to evaluate architectures and implement scalable designs.

Data Analytics

Data Analytics Analytics Big Data Data Lake

How to Select Data Catalog Software for Business Intelligence

Octopai

OCTOBER 5, 2021

Your CFO finally gave the okay to purchase data catalog software. How will you choose the best data catalog software for your company? Does it support automatic harvesting from your other data/BI software? Check the data catalog’s ability for ease of integration with each tool in your BI environment.

Business Intelligence

Business Intelligence Software Cost-Benefit Metadata

Data Catalog Management 101: The Tools and Roles You Need for Success

Octopai

JUNE 27, 2022

Sit back and imagine how you would feel when…. This data catalog is helping boost our bottom line.”. Your data team spends more time doing high-level analysis than it does searching for relevant datasets. Data catalog SUCCESS! Pick a data catalog that matches your enterprise’s needs.

Management

Management Metadata Visualization Data Governance

Visualize data quality scores and metrics generated by AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

AWS Glue Data Quality allows you to measure and monitor the quality of data in your data repositories. It’s important for business users to be able to see quality scores and metrics to make confident business decisions and debug data quality issues. An AWS Glue crawler crawls the results.

Data Quality

Data Quality Metrics Visualization Dashboards

Achieve higher query throughput: Auto scaling in Amazon OpenSearch Serverless now supports shard replica scaling

AWS Big Data

OCTOBER 12, 2023

At launch, OpenSearch Serverless supported increasing capacity automatically in response to growing data sizes. At launch, OpenSearch Serverless supported increasing capacity automatically in response to growing data sizes. In the following figure, an index has four shards to handle the product catalog.

Data Processing

Data Processing Testing Management Analytics

7 steps for turning shadow IT into a competitive edge

CIO Business Intelligence

NOVEMBER 21, 2023

After all, 41% of employees acquire, modify, or create technology outside of IT’s visibility , and 52% of respondents to EY’s Global Third-Party Risk Management Survey had an outage — and 38% reported a data breach — caused by third parties over the past two years. That’s not to downplay the inherent risks of shadow IT.

IT

IT Risk Cost-Benefit Risk Management

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

AWS Big Data

JUNE 6, 2023

Alerts and notifications play a crucial role in maintaining data quality because they facilitate prompt and efficient responses to any data quality issues that may arise within a dataset. It simplifies your experience of monitoring and evaluating the quality of your data.

Data Quality

Data Quality Metrics Data-driven Visualization

Snowflake Migration Best Practices

Octopai

FEBRUARY 20, 2022

That’s the data dream, right? Deciding to migrate your data from legacy, on-prem databases to a cloud-based data warehouse like Snowflake is often a step in the right direction. Let’s take a look at some concrete steps to set you up for a Snowflake data migration that will deliver better insights at a lower cost to you.

Data Warehouse

Data Warehouse Optimization Measurement Visualization

Best Practices for Metadata Management

Alation

JULY 19, 2021

Metadata is information about data. A clothing catalog or dictionary are both examples of metadata repositories. Indeed, a popular online catalog, like Amazon, offers rich metadata around products to guide shoppers: ratings, reviews, and product details are all examples of metadata. What Is Metadata? Why Is Metadata Important?

Metadata

Metadata Management Data Governance Machine Learning

Speed up queries with the cost-based optimizer in Amazon Athena

AWS Big Data

NOVEMBER 17, 2023

Athena provides a simplified, flexible way to analyze petabytes of data where it lives. You can analyze data or build applications from an Amazon Simple Storage Service (Amazon S3) data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python.

Optimization

Optimization Statistics Metadata Data Lake

The Role of the Data Catalog in Data Security

Alation

JUNE 14, 2021

The Role of Catalog in Data Security. We discuss how they are running the business of IT and cover subjects like digital transformation, business/IT alignment, IT leadership, and leading innovation. Recently, I dug in with CIOs on the topic of data security. Recently, I dug in with CIOs on the topic of data security.

Data Governance

Data Governance Recreation/Entertainment Data Lake Digital Transformation

How A Data Catalog Enhances Data Risk Management

Alation

JANUARY 9, 2023

No matter where you reside, it seems as if significant data breaches happen almost weekly. A recent flurry of data breaches resulting in the disclosure of private data covering tens of millions of Australians has renewed the visibility of regulations that compel organisations to better protect consumer data.

Risk Management

Risk Management Risk Management Metadata

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Data catalogs have quickly become a core component of modern data management. Organizations with successful data catalog implementations see remarkable changes in the speed and quality of data analysis, and in the engagement and enthusiasm of people who need to perform data analysis.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

Alation Wins Four TrustRadius “Top Rated” Awards

Alation

MAY 12, 2023

So when leading software review site TrustRadius announced that we had won their “Top Rated” awards in Data Catalog , Data Collaboration, Data Governance , and Metadata Management we were thrilled, but not surprised, since usability has been core to Alation’s product DNA since day 1. What does “Top Rated” mean?

Metadata

Metadata Data Governance Data-driven Sales

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Accelerate hybrid cloud transformation through IBM Cloud for Financial Service Validation Program

IBM Big Data Hub

APRIL 4, 2024

Lots of innovation is happening, with new technologies emerging in areas such as data and AI, payments, cybersecurity and risk management, to name a few. Lots of innovation is happening, with new technologies emerging in areas such as data and AI, payments, cybersecurity and risk management, to name a few.

Cost-Benefit

Cost-Benefit Risk Management Risk Digital Transformation

Copyright, AI, and Provenance

O'Reilly on Data

DECEMBER 12, 2023

This ruling in itself raises many questions: how much creativity is needed, and is that the same kind of creativity that an artist exercises with a paintbrush? How do we make sense of this? Training an LLM means teaching it how to understand and reproduce human language. This distinction is attractive for several reasons.

Modeling

Modeling Software Sales Statistics

From Chaos to Control with Data Intelligence

erwin

DECEMBER 3, 2020

As the amount of data grows exponentially, organizations turn to data intelligence to reach deeper conclusions about driving revenue, achieving regulatory compliance and accomplishing other strategic objectives. It’s no secret that data has grown in volume, variety and velocity, with 2.5 New products and services.

Metadata

Metadata Data Governance Data-driven Digital Transformation

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

An enterprise data catalog does all that a library inventory system does – namely streamlining data discovery and access across data sources – and a lot more. For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance.

Metadata

Metadata Data Quality Data-driven Data Governance

Leveraging CISA Known Exploited Vulnerabilities: Why attack surface vulnerability validation is your strongest defense

IBM Big Data Hub

DECEMBER 8, 2023

However, how do these organizations know that focusing on software with the highest scoring CVEs is the right approach? CISA strongly advises that organizations should regularly review and monitor the Known Exploited Vulnerabilities catalog and prioritize remediation.

Risk

Risk Testing Reporting Software

A Data Analyst’s Guide to the Data Catalog

Alation

MAY 17, 2022

Folks can talk all day about automated insights, but answering ad hoc business questions still relies on human beings who understand the business context and how their bosses communicate. An enterprise data catalog is one such key asset. The Data Analyst Workflow. 7 Steps that Benefit from a Data Catalog.

Metadata

Metadata Data Quality Machine Learning Reporting

Migrate an existing data lake to a transactional data lake using Apache Iceberg

AWS Big Data

OCTOBER 3, 2023

A data lake is a centralized repository that you can use to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data and then run different types of analytics for better business insights. Analytics use cases on data lakes are always evolving.

Data Lake

Data Lake Metadata Snapshot Recreation/Entertainment

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

AWS Big Data

NOVEMBER 6, 2023

In this post, we provide an overview of deferrable operators and triggers , including a walkthrough of an example showcasing how to use them. We also delve into some of the new features and capabilities of Apache Airflow, and how you can set up or upgrade your Amazon MWAA environment to version 2.7.2.

Metrics

Metrics Metadata Snapshot Management

How to choose the best DSPM solution for your organization: comparison of features, benefits, and pricing models of different DSPM vendors

Laminar Security

DECEMBER 1, 2023

Cloud data security is one of the most critical, yet most challenging aspects of doing business in the age of cloud. As more organizations migrate their data to the cloud, they face an increasing range of risks and threats, including data breaches, data leakage, data loss, data misuse, data compliance violations, shadow data and more.

Modeling

Modeling Cost-Benefit Risk Data-driven

How to Use a Data Lineage Tool to Ensure Data Quality

Octopai

MARCH 23, 2022

Dirty Meat… and Dirty Data. But even though “dirty meat” is a small concern, “dirty data” is the scourge of any industry that relies heavily on information systems. But even though “dirty meat” is a small concern, “dirty data” is the scourge of any industry that relies heavily on information systems. Cleaning Up Dirty Data.

Data Quality

Data Quality Reporting Modeling Interactive

AI adoption accelerates as enterprise PoCs show productivity gains

CIO Business Intelligence

APRIL 4, 2024

This involves rigorous evaluation of potential benefits, risks, and costs associated with each AI initiative to ensure investments are prudent and aligned with our risk-return profile.” We need to continue to be mindful of business outcomes and apply use cases that make sense.” We want to maintain discipline and go deep.”

Enterprise

Enterprise Cost-Benefit Forecasting Strategy

Alation Wins Four TrustRadius “Top Rated” Awards

Alation

MAY 12, 2023

So when leading software review site TrustRadius announced that we had won their “Top Rated” awards in Data Catalog , Data Collaboration, Data Governance , and Metadata Management we were thrilled, but not surprised, since usability has been core to Alation’s product DNA since day 1. What does “Top Rated” mean?

Metadata

Metadata Data Governance Data-driven Sales

Automating Model Risk Compliance: Model Development

DataRobot Blog

MAY 10, 2022

The regulatory guidance presented in these documents laid the foundation for evaluating and managing model risk for financial institutions across the United States. The regulatory guidance presented in these documents laid the foundation for evaluating and managing model risk for financial institutions across the United States.

Risk

Risk Modeling Machine Learning Data Quality

How Ontraport reduced data processing cost by 80% with AWS Glue

AWS Big Data

AUGUST 11, 2023

Customers are implementing data and analytics workloads in the AWS Cloud to optimize cost. When implementing data processing workloads in AWS, you have the option to use technologies like Amazon EMR or serverless technologies like AWS Glue. This post is written in collaboration with Elijah Ball from Ontraport.

Data Processing

Data Processing Cost-Benefit Optimization Interactive

Forrester Does the Math on the ROI of the Alation Data Catalog

Alation

FEBRUARY 13, 2020

Whether we’re speaking to data analysts or CDOs, data people almost instantly understand the value of the Alation Data Catalog. Faces light up when we describe how Alation helps enterprises find, understand, trust, use and reuse data. There is plenty of market validation for the value of data catalogs.

ROI

ROI Cost-Benefit Unstructured Data Data Lake

Four Steps to Building a Data-Driven Culture

erwin

JULY 22, 2020

Fostering organizational support for a data-driven culture might require a change in the organization’s culture. Recently, I co-hosted a webinar with our client E.ON , a global energy company that reinvented how it conducts business from branding to customer engagement – with data as the conduit. As an example, E.ON

Data-driven

Data-driven Data Governance Digital Transformation Enterprise

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

IBM Big Data Hub

AUGUST 24, 2022

The only question is, how do you ensure effective ways of breaking down data silos and bringing data together for self-service access? It starts by modernizing your data integration capabilities – ensuring disparate data sources and cloud environments can come together to deliver data in real time and fuel AI initiatives.

Data Integration

Data Integration Metadata Data-driven Data Architecture

What Is ‘Equity As Code,’ And How Can It Eliminate AI Bias?

DataKitchen

JUNE 7, 2021

This is a common question that we hear from our conversations with data scientists, engineers and analysts. This is a common question that we hear from our conversations with data scientists, engineers and analysts. How can one get started given these limitations? What can you do? DataOps is not an all-or-nothing proposition.

Testing

Testing IT Measurement Data-driven

How Your Finance Team Can Lead Your Enterprise Data Transformation

Alation

OCTOBER 26, 2021

Today’s best-performing organizations embrace data for strategic decision-making. Because of the criticality of the data they deal with, we think that finance teams should lead the enterprise adoption of data and analytics solutions. This is because accurate data is “table stakes” for finance teams.

Finance

Finance Data Transformation Enterprise Metrics

Projects in SQL Stream Builder

Cloudera

MAY 1, 2023

Businesses everywhere have engaged in modernization projects with the goal of making their data and application infrastructure more nimble and dynamic. release of Cloudera’s SQL Stream Builder (available on CDP Public Cloud 7.2.16 What is a Project in SSB? All of these resources together make up the project.

Testing

Testing Management Data Processing IT

Measure performance of AWS Glue Data Quality for ETL pipelines

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Webinars

Trending Sources

Do I Need a Data Catalog?

Webinars

Amazon DataZone now integrates with AWS Glue Data Quality and external data quality solutions

How to Evaluate a Data Catalog

Enhance query performance using AWS Glue Data Catalog column-level statistics

Petabyte-scale log analytics with Amazon S3, Amazon OpenSearch Service, and Amazon OpenSearch Ingestion

Data governance in the age of generative AI

Set up advanced rules to validate quality of multiple datasets with AWS Glue Data Quality

Announcing the AWS Well-Architected Data Analytics Lens

How to Select Data Catalog Software for Business Intelligence

Data Catalog Management 101: The Tools and Roles You Need for Success

Visualize data quality scores and metrics generated by AWS Glue Data Quality

Achieve higher query throughput: Auto scaling in Amazon OpenSearch Serverless now supports shard replica scaling

7 steps for turning shadow IT into a competitive edge

Set up alerts and orchestrate data quality rules with AWS Glue Data Quality

Snowflake Migration Best Practices

Best Practices for Metadata Management

Speed up queries with the cost-based optimizer in Amazon Athena

The Role of the Data Catalog in Data Security

How A Data Catalog Enhances Data Risk Management

What Is a Data Catalog?

Alation Wins Four TrustRadius “Top Rated” Awards

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Accelerate hybrid cloud transformation through IBM Cloud for Financial Service Validation Program

Copyright, AI, and Provenance

From Chaos to Control with Data Intelligence

Five benefits of a data catalog

Leveraging CISA Known Exploited Vulnerabilities: Why attack surface vulnerability validation is your strongest defense

A Data Analyst’s Guide to the Data Catalog

Migrate an existing data lake to a transactional data lake using Apache Iceberg

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

How to choose the best DSPM solution for your organization: comparison of features, benefits, and pricing models of different DSPM vendors

How to Use a Data Lineage Tool to Ensure Data Quality

AI adoption accelerates as enterprise PoCs show productivity gains

Alation Wins Four TrustRadius “Top Rated” Awards

Automating Model Risk Compliance: Model Development

How Ontraport reduced data processing cost by 80% with AWS Glue

Forrester Does the Math on the ROI of the Alation Data Catalog

Four Steps to Building a Data-Driven Culture

IBM named a leader in the 2022 Gartner® Magic Quadrant™ for Data Integration Tools

What Is ‘Equity As Code,’ And How Can It Eliminate AI Bias?

How Your Finance Team Can Lead Your Enterprise Data Transformation

Projects in SQL Stream Builder

Stay Connected