How to manage data as a product

BrandPost By Paul Gillin
Jun 06, 20234 mins
Data Management

The way many organizations manage data is often out of step with the way employees want to use data. Here’s how to reorient your strategy.

Conference room
Credit: iStock/gorodenkoff

Distributed data ownership is a new idea that has recently captured the attention of IT executives and chief data officers. The concept: data should be curated by the people who know it best versus locked up in an IT ivory tower. Furthermore, owners should treat data as a product, ensuring that it is clean, current, well-indexed, and available to anyone in the organization who can derive value from it. Ultimately, data served as a product may be used by anyone in the organization. Those end users may be all over the place in areas that have little to do with the domain that manages it.

This idea, which is sometimes called a “data mesh,” is compelling to the many organizations seeking to both digitally transform and become more data driven. The devil, however, is in the details. Organizations that have attempted to distribute data ownership have found that the principal barriers are more cultural than technical and that the journey is a long and difficult one. The goal also isn’t appropriate for every scenario.

In a word, the way most organizations go about managing data is out of step with the way people want to use data, says Wim Stoop, senior director of product marketing at Cloudera. If you want to get your teeth fixed or your appendix out you go to an expert rather than a generalist,” he says. “The same should apply to the data that people in organizations need.”

However, most enterprises treat data as a centralized and protected asset. It’s locked up in production applications, data warehouses, and data lakes that are administered by a small cadre of technical specialists. Access is tightly controlled, and few people are aware of data the organization possesses outside of their immediate purview.

The drive towards organization agility has helped fuel interest in the data mesh. “Individual teams that are responsible for data can iterate faster in a well-defined construct,” Stoop says. “The shift to treating data as a product breaks down siloes and gives data longevity because it’s clearly defined, supported and maintained by the employees that know it intimately.”

Unfortunately, you can’t buy a data mesh. While numerous tools are available to federate distributed data, the complexity comes from aligning teams across an organization around a common set of definitions and governance principles.

“Individual teams need to be able to self-serve their own data infrastructure – but you need overarching governance that ties everything together,” Stoop says. “Creating consistency of security and governance is hard to do in a single data center; it’s much more difficult across multiple point solutions on-premises and in the cloud.”

What’s the solution?

For those companies that are up to the challenge, there is what Stoop calls a “well-trodden path that gets you there.” It starts with discovering all the data you already own. A data fabric ferrets out data from across on-premises and cloud data stores, or across the hybrid cloud. It also helps you understand, document, and monitor its use. This is a huge win from the start because experts estimate that more than 80% of the data organizations collect is never used for analysis. 1

This discovery process creates a data catalog which in turn provides a foundation for creating a set of terms, tags, and conventions to describe data assets in business terms versus fields in a database. Those conventions “are applied from ingestion to transformation, cleaning, modeling, preparation, curation, and distribution with self-service APIs,” Stoop says. You can then start to distribute both the data and its management. 

It isn’t that conventional data warehouses and lakes are bad – but they should be used only when data owners believe they can derive value from them.

“Every domain should have the autonomy to choose how they massage their data,” Stoop says. “If the pandemic taught us anything it’s that organizations need to be able to quickly shift direction. You can only do that if you aren’t burdened with monolithic data structures.”

For more information on discovering where your data resides, visit www.cloudera.com

1 https://www.prnewswire.com/news-releases/study-reveals-massive-incentive-to-activate-unused-data-301540393.html_