Master Data Match Rules

PopTika / Shutterstock

As part of a master data management (MDM) implementation, a series of rules must be implemented to determine if two records refer to the same real-world entity that they represent. In the world of MDM, this is often referred to as the golden record, and master data match rules identify when two should become one. 

Introduction 

Match rules are guiding principles that help us harmonize data across various domains. Think of them as the navigational compass in the vast ocean of information. Common types of match rules are exact matches, fuzzy matches, and token-based matches

An exact match will compare two fields precisely to determine if they are identical, such as name, birth date, or username. Fuzzy matches are similar – they will look at two fields but account for common variations, like the name “Mark” vs. “Marc.” Fuzzy matching can also look at different formats and compare things like an MM/DD/YYYY vs. DD/MM/YYYY date format. Token-based matching will break content into tokens (words, phrases) and compare those tokens for similarity (similar to a fuzzy match). 

When implementing a series of match rules to compare records, one can then put in place a confidence score that two records are the same. “Mark” vs. “Marc” born on “01/02/1900” vs. “02/01/1900” could have a very high confidence score. Depending on the strength of that score, a solution could automate resolving those entities into a single record or flag these records for human review on whether they should be resolved into a single record. 

Given that, there are three common domains we consider when implementing MDM: entities, products, and locations. 

Entities 

An “entity,” in the case of MDM, is typically a person or customer. It can also refer to companies as well (vendors). Typically, when working with entity information, there’s a significant amount of regulated private data like birthdate, name, and addresses. When considering match rules for entities, you’ll most commonly consider exact matches and fuzzy matches of fields – as alluded to above with our name and birthdate examples. Often, person data can be difficult to resolve with confidence; that’s when we explore things like a Soundex algorithm, where names are indexed on and compared to by sound. 

Vendor data is often an interesting use case for many organizations. Imagine having a vendor for “7/11” – or is that “Seven Eleven,” “7-11,” “Sev,” or “Seven Eleven Inc.”? 

Products 

“Product” data in MDM typically revolves around SKUs (Stock Keeping Units), that is, discrete items in a database of things. Product can also refer to inventory, or in manufacturing, items that might be on the bill of materials for the manufacture of a particular product. 

Products are often grouped into categories, and when executing match rules, one can often make use of managed reference data to help determine if two products are the same product. If two “phones” are grouped as “iPhone 12” under a category of “Apple Phones,” then there’s a possibility that they could refer to the same product. Combine this with exact matching on name or a fuzzy match on name, and you could have two of the same product. 

This is a great example, however, of a confidence rating that would require human intervention. As we know, there are many specific types of “iPhone 12,” and this could be legitimate! 

Locations 

When we consider location data, we’re often referring to physical locations of brick-and-mortar retail locations around the globe. Location data can also be reference data, such as states or country codes.  One does not need multiple entries for the United States of America, such as “USA” or “US.” 

When considering physical store locations, we again might consider the exact match on name or address, but we could also consider matching on geo-codes or latitude and longitude. It’d be unlikely for two distinct stores to be located at the exact same position on the planet. 

Resolving location entities is critical, as it would be unfortunate to close two underperforming locations that happen to be a single high-performing retail location due to un-harmonized data. 

Conclusion 

When implementing master data management, it’s essential to consider the domain of data that you’re looking to make a golden record or single version of the truth. Once that domain is identified, you’ll need to consider how you know that those records refer to the same real-world entity, and what types of processes you need to put in place to resolve them, be they manual or automated. 

Now go forth and resolve your entities! 

Share this post

Mark Horseman

Mark Horseman

Mark is an IT professional with nearly 20 years of experience and acts as the data evangelist for DATAVERSITY. Mark moved into Data Quality, Master Data Management, and Data Governance early in his career and has been working extensively in Data Management since 2005.

scroll to top