How Metadata Improves Security, Quality, and Transparency

Metadata is the data providing context about the data, more than what you see in the rows and columns. By managing your metadata, you're effectively creating an encyclopedia of your data assets.

By Tim Lysecki, Product Marketing Manager at ThinkData Works on April 25, 2022 in Data Science

How does Spotify battle against a giant like Apple? One word: data. With machine learning and AI, Spotify creates value for their users by providing a more personalized and bespoke experience. Let’s take a quick look at the layers of aggregate information that are used to enhance their platform:

Spotify uses natural language processing (NLP) to scan discussion forums about the music you’re listening to, then matches your preferences to other music being discussed similarly;
the composition of the music is analyzed for tone, sound, loudness, tonality (i.e. major or minor), and several other factors used to recommend similar songs and artists;
and of course, Spotify measures behaviour when listening to music, tracking repeat plays, or skipping past a song, establishing preferences and therefore improving recommendations.

The core data here is in the music – the basic components of songs like the title, artist, and duration. Choosing a song to listen to sets the baseline (and maybe you like it for its bass line). Everything else can be seen as metadata: additional elements about how one listens, how the song is composed, and what other music it sounds like.

Metadata, here, is the driving force of Spotify’s algorithm, and it’s collected and applied constantly to provide you with intelligent recommendations to keep you listening.

What Is Metadata?

In simple terms, within the technology industry, “meta” refers to an underlying definition or description. More directly, metadata provides context about the data, more than what you see in the rows and columns.

That definition is quite broad, but that’s mostly because it can be used for almost any purpose – it can tell you what each column header means in detail, who uploaded the data and when, the column and row counts for the whole dataset, the original data source, or even warehousing and residency requirements.

How Can Metadata Be Organized?

There are 3 main types of metadata that work together: administrative, descriptive, and structural. Each serves a different purpose in explaining the corresponding data.

Structural metadata – provides insight into how data elements are organized. This facilitates quick and easy navigation, like a table of contents or page numbers. Structural metadata allows similar data to be grouped together, documenting relationships among unique datasets.

Administrative metadata – offers technical information about the data. It covers aspects such as the origin of the data, type of data and access or usage licences.

Descriptive metadata – adds information about the owner, when the data was created/published, and what the data includes. The essential purpose is to ease identification and offer a snapshot of the data it describes.

A combination of these types of metadata allows organizations to navigate through vast amounts of data efficiently, making it easy to find what you need when you need it.

Why Is Metadata Important?

53% of analytics consumers have difficulty locating and accessing data content. With increasing amounts of data, it is important for organizations to understand the data they have, where it is, and how to use it.

Metadata's utility does not begin and end with describing data. Metadata can enable easier data discovery, and can help increase understanding of a dataset. Take a library book, for example. If the text is the primary data, the book jacket may have a brief summary of the book, and comments from others about the book. Importantly, the library may also append data that gives the book a category, genre, and unique identifier for easier organization and retrieval.

Metadata can also assist in compliance with regulatory requirements by ensuring that your organization tracks usage, sharing, and licence permissions at the dataset level. By appending metadata that makes it clear how the data can be used, for what purpose, and who it can or can’t be shared with, you’re able to build security and compliance into the data itself.

Metadata Management In A Data Catalog Platform

By managing your metadata, you're effectively creating an encyclopedia of your data assets. Metadata management is a subset of data management, which itself falls into the category of data governance.

The primary reasons to focus on metadata management, then, are the same reasons for implementing data governance strategies: improving data security, data quality, and overall transparency.

Metadata management in a data catalog platform

Improving data security:

Metadata ties usage restrictions and licensing directly to data
Reveals data ownership and maintainer(s) for clear role identification
Consolidates and codifies information associated with a dataset so it can’t be lost

Improving data quality:

Designing/implementing an organization-wide ontology
Entity resolution/record linkage made easier
Insight into changes to the over time

Improving transparency:

Increases discoverability within an organization and across teams
Creates auditable records of usage, access, and updates
Shares information without revealing sensitive data

Instead of treating metadata as additional attributes or pieces of information that exist outside the data, sophisticated metadata management is about linking this rich information to the dataset itself in a way that’s easy to access, enforce, and manage.

What’s The Benefit Of Metadata In A Data Catalog?

Using ThinkData Works’ specific tools and features, you can unlock valuable benefits stemming from metadata:

Custom metadata – the ability to add any metadata to a dataset, including linked/related datasets, upload use agreements, costs & licensing, and data dictionaries

Configurable property definitions – the data catalog lets you input schema descriptions within the dataset, tying metadata to the properties

Dataset versioning/revisions – versions of each dataset structure as the schema changes over time, and tracked revisions each time the data is updated. This way, users can follow stable versions of the data while updating their models and dashboards

Data health monitoring – a dashboard for reports and alert configuration based on the data as it changes over time, including ‘macro’ information (like row and column counts) or ‘micro’ information (like value types or value bounds)

Access Auditing – specific usage statistics and information which describe user behaviours, API calls, and other actions performed with or to the data.

Flexible Management, Strict Governance

Metadata management is a critical piece of sound data governance – one of the most crucial parts of an effective data strategy. We know that every organization has unique needs, so a good metadata solution should be strong and enforceable, but flexible enough to manage data in a way that’s tailored to each company.

By offering comprehensive metadata management, ThinkData Works enables our clients to build data-driven solutions on strong, secure foundations.

Do you think your business has a need for a data catalog to find, understand, and use trusted data to drive business outcomes? Reach out to unlock the value in data.

Tim Lysecki is the Product Marketing Manager at ThinkData Works where he shapes the company’s market strategy, directs media about the company & products, and expands the client roster. In his spare time, he’s an avid songwriter, performer, and photographer.