Blog - Data Leaders Brief

how-does-a-data-catalog-work

Blog

How does a data catalog work?

Alation

FEBRUARY 27, 2020

The post How does a data catalog work? The architectures of the past for BI and analytics – the Corporate Information Factory or the Bus Architecture – are now only one part of a complete analytical environment. Figure 1 gives you a good idea of […]. appeared first on Alation.

Technology

Technology Analytics Enterprise

Addressing Data Mesh Technical Challenges with DataOps

DataKitchen

AUGUST 9, 2021

Below is our third post (3 of 5) on combining data mesh with DataOps to foster greater innovation while addressing the challenges of a decentralized architecture. We’ve talked about data mesh in organizational terms (see our first post, “ What is a Data Mesh? ”) and how team structure supports agility.

Testing

Testing Data Lake Metadata Publishing

Join 52,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

MORE WEBINARS

Trending Sources

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Cloudera

MARCH 2, 2023

Recently, we announced enhanced multi-function analytics support in Cloudera Data Platform (CDP) with Apache Iceberg. Iceberg is a high-performance open table format for huge analytic data sets. This enables you to maximize utilization of streaming data at scale. The Catalog Type should be set to Hive. ssb_default`.`iceberg_hive_example`

Snapshot

Snapshot Data Processing Metadata Management

Webinars

The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Communication

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

Manufacturing Sustainability Surge: Your Guide to Data-Driven Energy Optimization & Decarbonization

From Developer Experience to Product Experience: How a Shared Focus Fuels Product Success

Understanding User Needs and Satisfying Them

MORE WEBINARS

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Third, any commitment to a disruptive technology (including data-intensive and AI implementations) must start with a business strategy. That is: (1) What is it you want to do and where does it fit within the context of your organization? (2) These changes may include requirements drift, data drift, model drift, or concept drift.

Strategy

Strategy Experimentation Uncertainty Machine Learning

Introducing Apache Hudi support with AWS Glue crawlers

AWS Big Data

NOVEMBER 22, 2023

Apache Hudi is an open table format that brings database and data warehouse capabilities to data lakes. Apache Hudi helps data engineers manage complex challenges, such as managing continuously evolving datasets with transactions while maintaining query performance.

Data Lake

Data Lake Snapshot Metadata Optimization

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

AWS Big Data

SEPTEMBER 7, 2023

Customers of all sizes and industries use Amazon Simple Storage Service (Amazon S3) to store data globally for a variety of use cases. Customers want to know how their data is being accessed, when it is being accessed, and who is accessing it. With exponential growth in data volume, centralized monitoring becomes challenging.

Metadata

Metadata Dashboards Metrics Visualization

Unlock data across organizational boundaries using Amazon DataZone – now generally available

AWS Big Data

OCTOBER 4, 2023

Amazon DataZone enables customers to discover, access, share, and govern data at scale across organizational boundaries, reducing the undifferentiated heavy lifting of making data and analytics tools accessible to everyone in the organization. This is challenging because access to data is managed differently by each of the tools.

Metadata

Metadata Data Lake Publishing Data Governance

The People’s Data Catalog: Alation Featured as Top Choice in Eckerson’s Latest Report

Alation

JULY 15, 2021

Many data catalog initiatives fail. How can prospective buyers ensure they partner with the right catalog to drive success? According to the latest report from Eckerson Group, Deep Dive on Data Catalogs , shoppers must match the goals of their organizations to the capabilities of their chosen catalog.

Reporting

Reporting Data Governance Recreation/Entertainment Metadata

How Knowledge Graphs Power Data Mesh and Data Fabric

Ontotext

APRIL 10, 2024

The data ecosystem today is crowded with dazzling buzzwords, all fighting for investment dollars. A survey in 2021 found that a data company was being funded every 45 minutes. Data ecosystems have become jungles and in spite of all the technology, data teams are struggling to create a modern data experience.

Metadata

Metadata Data Lake Data Warehouse Data Quality

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

What is data lineage? Data lineage traces data’s origin, history, and movement through various processing, storage, and analysis stages. It is used to understand the provenance of data and how it is transformed and to identify potential errors or issues. How does it compare?

Testing

Testing Data Governance Data Quality Data-driven

Webinar Summary: Data Mesh and Data Products

DataKitchen

MAY 4, 2023

Webinar Summary: DataOps and Data Mesh Chris Bergh, CEO of DataKitchen, delivered a webinar on two themes – Data Products and Data Mesh. Bergh started by discussing the complexity within data and analytics teams, stating that complexity makes everything more complicated and, in the long run, it kills productivity.

Measurement

Measurement Data-driven Testing Cost-Benefit

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

AUGUST 27, 2021

DataOps has become an essential methodology in pharmaceutical enterprise data organizations, especially for commercial operations. Companies that implement it well derive significant competitive advantage from their superior ability to manage and create value from data.

Analytics

Analytics Sales Testing Cost-Benefit

How to implement Alation Data Catalog into your company from day one

Alation

NOVEMBER 15, 2022

Digital transformation or a change in a company’s mindset to embrace data is an important step towards becoming a truly data intelligent and more efficient business. And a data catalog is an important part of this. But what does your company need to do to achieve this?

Digital Transformation

Digital Transformation Software IT

How Cargotec uses metadata replication to enable cross-account data sharing

AWS Big Data

JUNE 7, 2023

This is a guest blog post co-written with Sumesh M R from Cargotec and Tero Karttunen from Knowit Finland. Cargotec captures terabytes of IoT telemetry data from their machinery operated by numerous customers across the globe. Cargotec’s use cases also required them to create views that span tables and views across catalogs.

Metadata

Metadata Data Lake Machine Learning Big Data

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Data fabric and data mesh are emerging data management concepts that are meant to address the organizational change and complexities of understanding, governing and working with enterprise data in a hybrid multicloud ecosystem. The good news is that both data architecture concepts are complimentary.

Management

Management Metadata Data Architecture Data Lake

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows.

Data Processing

Data Processing Data Lake Cost-Benefit Testing

What is Active Metadata & Why it Matters: Key Insights from Gartner’s Market Guide

Alation

MARCH 2, 2023

Instead, we got data. Lots and lots of data. It does feel, however, as if we need jet-like speed to analyze and understand our data, who is using it, how it is used, and if it is being used to drive value. This is because simply collecting data leaves it open to misinterpretation, misuse, and decay.

Metadata

Metadata Marketing IT Data Quality

Data as an Asset on your Balance Sheet has to Happen

Andrew White

MARCH 21, 2022

I was perusing my back-catalog of IMF magazines and I found a gem of an article. It is called, Shaping a Data Economy (F&D magazine, December 2020). The tag line is: The world needs a new system of governance for the buying and selling of data. This is in preparation for our upcoming Data and Analytics conference series.

Data Strategy

Data Strategy Strategy Measurement Marketing

Networks unchained: the shift toward intent-based autonomous operations

IBM Big Data Hub

JANUARY 26, 2024

To succeed in the modern era, CSPs need versatile teams, including data scientists for data interpretation and operations, software developers for automation through vendor application programming interfaces (API) and service assurance engineers for designing closed loops to ensure service reliability.

IoT

IoT Data-driven Interactive Management

What Is a Data Catalog?

Alation

FEBRUARY 13, 2020

Data catalogs have quickly become a core component of modern data management. Organizations with successful data catalog implementations see remarkable changes in the speed and quality of data analysis, and in the engagement and enthusiasm of people who need to perform data analysis.

Metadata

Metadata Data Lake Recreation/Entertainment Big Data

Leveraging CISA Known Exploited Vulnerabilities: Why attack surface vulnerability validation is your strongest defense

IBM Big Data Hub

DECEMBER 8, 2023

However, how do these organizations know that focusing on software with the highest scoring CVEs is the right approach? Does reducing the number of critical CVEs significantly reduce the risk of a breach?

Risk

Risk Testing Software Reporting

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Read the complete blog below for a more detailed description of the vendors and their capabilities. This is not surprising given that DataOps enables enterprise data teams to generate significant business value from their data. Testing and Data Observability. Download the 2021 DataOps Vendor Landscape here.

Testing

Testing Machine Learning Consulting Data Quality

Enriching Streams with Hive tables via Flink SQL

Cloudera

NOVEMBER 18, 2022

Stream processing is about creating business value by applying logic to your data while it is in motion. Many times that involves combining data sources to enrich a data stream. Flink SQL does this and directs the results of whatever functions you apply to the data into a sink. Select “Hive” as catalog type.

Data Processing

Data Processing Advertising IT

Data privacy examples

IBM Big Data Hub

APRIL 24, 2024

An online retailer always gets users’ explicit consent before sharing customer data with its partners. A navigation app anonymizes activity data before analyzing it for travel trends. One cannot overstate the importance of data privacy for businesses today. The user can accept or reject each use of their data individually.

Risk

Risk Measurement Data Governance Insurance

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

AWS Big Data

JUNE 26, 2023

In today’s digital world, data is generated by a large number of disparate sources and growing at an exponential rate. Companies are faced with the daunting task of ingesting all this data, cleansing it, and using it to provide outstanding customer experience. It’s commonly referred to as a data harmonization or deduplication problem.

Insurance

Insurance Visualization Data Lake Metrics

How IBM HR leverages IBM Watson® Knowledge Catalog to improve data quality and deliver superior talent insights

IBM Big Data Hub

JUNE 12, 2023

Companies rely heavily on data and analytics to find and retain talent, drive engagement, improve productivity and more across enterprise talent management. However, analytics are only as good as the quality of the data, which must be error-free, trustworthy and transparent. What is data quality? million each year.

Data Quality

Data Quality Data Governance People Analytics Data-driven

Everything is Connected, Everything Changes

Alation

OCTOBER 7, 2021

Jason McVay is a data scientist at Indigo Ag, an agriculture-tech company headquartered in Massachusetts. In this essay, Jason reflects on the value of thinking spatially about data, showing how his experience as a graduate student influences his role as a data scientist today. The rise of spatial data.

Data Lake

Data Lake Visualization Data Science Digital Transformation

Five benefits of a data catalog

IBM Big Data Hub

DECEMBER 16, 2022

An enterprise data catalog does all that a library inventory system does – namely streamlining data discovery and access across data sources – and a lot more. For example, data catalogs have evolved to deliver governance capabilities like managing data quality and data privacy and compliance.

Metadata

Metadata Data Quality Data-driven Data Governance

What Is Digital Transformation?

Alation

NOVEMBER 4, 2022

In business, data-based goals tend to be very tangible. Perhaps you want to boost your ROI or CAGR, or reduce the time your analysts spend accessing and leveraging data. In this blog we’ll describe digital transformation: how it can be achieved as well as how it can benefit your business.

Digital Transformation

Digital Transformation Cost-Benefit Insurance Machine Learning

Using a Data Catalog to Crisis-Proof Your Business

erwin

APRIL 21, 2020

One of the biggest lessons we’re learning from the global COVID-19 pandemic is the importance of data, specifically using a data catalog to comply, collaborate and innovate to crisis-proof our businesses. So one of the biggest lessons we’re learning from COVID-19 is the need for data collection, management and governance.

Measurement

Measurement Sales Data Governance Metadata

How to Ensure Continuous Improvement With Data Governance

Alation

FEBRUARY 3, 2022

The goal of DataOps is to create predictable delivery and change management of data and all data-related artifacts. DataOps practices help organizations overcome challenges caused by fragmented teams and processes and delays in delivering data in consumable forms. So how does data governance relate to DataOps?

Data Governance

Data Governance Measurement Metadata Testing

Alation Launches Open Data Quality Framework

Alation

MAY 24, 2022

In a sea of questionable data, how do you know what to trust? Data quality tells you the answer. It signals what data is trustworthy, reliable, and safe to use. It empowers engineers to oversee data pipelines that deliver trusted data to the wider organization. Today, as part of its 2022.2

Data Quality

Data Quality Metadata Reporting Metrics

Becoming a Data-Driven Organisation in 4 Steps

Alation

NOVEMBER 17, 2022

But what does it mean for an organisation to be truly data-driven? What foundation needs to be in place at the start, and what journey does an organisation need to embrace to benefit from the forensic insights their data can reveal? So how do you get all of this right? Everyone talks about it, everyone wants it.

Data-driven

Data-driven Cost-Benefit Business Objectives Technology

Accelerating Your Statewide Data Strategy with Alation

Alation

MARCH 29, 2023

To shed further light on the value of these contracts and how they benefit our customers, I sat down for an interview with our marketing team. Prior to that, I was a CTO in the state of California and have worked for state and local governments for 15 years. What does this mean for public leaders in these regions?

Data Strategy

Data Strategy Strategy Data Governance Sales

What Is Alation Connected Sheets? Q&A with the Creators

Alation

NOVEMBER 28, 2022

introduces Alation Connected Sheets, a new product under Alation Cloud Service that empowers spreadsheet users with access to trusted data. This means that business users can reduce their reliance on IT and data teams for access to data, all without having to learn a new tool. What does that mean? The release of 2022.4

Metadata

Metadata Enterprise Cost-Benefit Finance

What Is Data Governance? (And Why Your Organization Needs It)

erwin

AUGUST 28, 2020

Organizations with a solid understanding of data governance (DG) are better equipped to keep pace with the speed of modern business. In this post, the erwin Experts address: What Is Data Governance? Why Is Data Governance Important? What Is Good Data Governance? What Are the Key Benefits of Data Governance?

Data Governance

Data Governance IT Cost-Benefit Metadata

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

Since 2015, the Cloudera DataFlow team has been helping the largest enterprise organizations in the world adopt Apache NiFi as their enterprise standard data movement tool. This need has generated a market opportunity for a universal data distribution service. Why does every organization need it when using a modern data stack?

Enterprise

Enterprise Data Lake Data Collection Data-driven

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Ontotext

AUGUST 4, 2023

Generating actionable insights across growing data volumes and disconnected data silos is becoming increasingly challenging for organizations. Working across data islands leads to siloed thinking and the inability to implement critical business initiatives such as Customer, Product, or Asset 360.

Metadata

Metadata Data-driven Data Architecture Data Quality

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

AWS Big Data

MAY 24, 2023

When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational use cases for your S3 data lake to optimize the production environment. When the catalog property s3.delete-enabled Amazon S3 deletes expired objects on your behalf. With the s3.delete.tags

Data Lake

Data Lake Snapshot Metadata Optimization

Implement fine-grained access control in Amazon SageMaker Studio and Amazon EMR using Apache Ranger and Microsoft Active Directory

AWS Big Data

NOVEMBER 8, 2023

Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) that enables data scientists and developers to perform every step of the ML workflow, from preparing data to building, training, tuning, and deploying models.

Testing

Testing Modeling Management Machine Learning

Driving Data Catalog Adoption

Alation

FEBRUARY 13, 2020

In a recent blog, titled Collaboration and Crowdsourcing with Data Cataloging , I discussed the importance of participation by all data stakeholders as a key to getting maximum value from your data catalog. This build-it-and-they-will-come approach fails to engage people to actively use the catalog.

Metadata

Metadata Data Governance Cost-Benefit Visualization

You Cannot Get to the Moon on a Bike!

Ontotext

JANUARY 10, 2024

Next, I will explain how knowledge graphs help them to get a unified view to data derived from multiple sources and get richer insights in less time. Often, an enterprise starts with one thing it does well and then adds more business lines to expand the market. Of course, they are complex. But so are rockets.

Metadata

Metadata Slice and Dice Data Integration Enterprise

Data Strategy and Decentralization: A Data Architect’s View

Alation

MARCH 1, 2023

How are blockchain organizations tackling data management? To learn the answer, we sat down with Karla Kirton , Data Architect at Blockdaemon, a blockchain company, to discuss data strategy , decentralization, and how implementing Alation has supported them. What are the goals of your data team?

Data Strategy

Data Strategy Strategy Metadata Interactive

Day in the Life of an Analyst at Gartner’s IT Symposium/XPO 2022 – Day 4 and Summary

Andrew White

OCTOBER 20, 2022

So here is my day 4 in the day-in-the-life series of blogs. Gartner’s Value Pyramid and “linking data to outcome” is a very popular workshop tool to help business and non-business folks explore how a business outcome can be de-composed into real data. My score for Thursday was 706, and the line looks quite stable.

IT Strategy Insurance Data-driven

How does a data catalog work?

Addressing Data Mesh Technical Challenges with DataOps

Webinars

Trending Sources

Streaming Ingestion for Apache Iceberg With Cloudera Stream Processing

Webinars

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Introducing Apache Hudi support with AWS Glue crawlers

Extracting key insights from Amazon S3 access logs with AWS Glue for Ray

Unlock data across organizational boundaries using Amazon DataZone – now generally available

The People’s Data Catalog: Alation Featured as Top Choice in Eckerson’s Latest Report

How Knowledge Graphs Power Data Mesh and Data Fabric

“You Complete Me,” said Data Lineage to DataOps Observability.

Webinar Summary: Data Mesh and Data Products

How DataOps is Transforming Commercial Pharma Analytics

How to implement Alation Data Catalog into your company from day one

How Cargotec uses metadata replication to enable cross-account data sharing

Augmented data management: Data fabric versus data mesh

Centralize Your Data Processes With a DataOps Process Hub

What is Active Metadata & Why it Matters: Key Insights from Gartner’s Market Guide

Data as an Asset on your Balance Sheet has to Happen

Networks unchained: the shift toward intent-based autonomous operations

What Is a Data Catalog?

Leveraging CISA Known Exploited Vulnerabilities: Why attack surface vulnerability validation is your strongest defense

The DataOps Vendor Landscape, 2021

Enriching Streams with Hive tables via Flink SQL

Data privacy examples

Harmonize data using AWS Glue and AWS Lake Formation FindMatches ML to build a customer 360 view

How IBM HR leverages IBM Watson® Knowledge Catalog to improve data quality and deliver superior talent insights

Everything is Connected, Everything Changes

Five benefits of a data catalog

What Is Digital Transformation?

Using a Data Catalog to Crisis-Proof Your Business

How to Ensure Continuous Improvement With Data Governance

Alation Launches Open Data Quality Framework

Becoming a Data-Driven Organisation in 4 Steps

Accelerating Your Statewide Data Strategy with Alation

What Is Alation Connected Sheets? Q&A with the Creators

What Is Data Governance? (And Why Your Organization Needs It)

Moving Enterprise Data From Anywhere to Any System Made Easy

Usability and Connecting Threads: How Data Fabric Makes Sense Out of Disparate Data

Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes

Implement fine-grained access control in Amazon SageMaker Studio and Amazon EMR using Apache Ranger and Microsoft Active Directory

Driving Data Catalog Adoption

You Cannot Get to the Moon on a Bike!

Data Strategy and Decentralization: A Data Architect’s View

Day in the Life of an Analyst at Gartner’s IT Symposium/XPO 2022 – Day 4 and Summary

Stay Connected