Skip to main content

Healthcare

Data Lake and Information Governance – The Key Takeaways

A Data Lake can be a highly valuable asset to any enterprise, and there is a myriad of technology solutions available for leveraging the processes to feed, maintain and retrieve information from the Lake.

But all this technology is, if not worthless, significantly less valuable, if the environment is not well governed and managed. This is the primary Takeaway to keep in mind when a Data Lake solution is being considered – or is already in place but needing improvement – by any organization.

Another takeaway is the idea of positioning the Data Lake as an Aggregator of information – and for it to operate analogically like a Warehouse store – positioned to serve Consumers, but ultimately is responsible for determining how best to collect, store, and make available, the information it houses. This takeaway significantly influences how the Governance of the environment is set up and run.

Accepting the above two statements – the criticality of Governance and the Operating Model of an Aggregator – some other observations can be made:

The Supplier

  • Needn’t have knowledge of the Consumer(s) as they work directly and exclusively with the Aggregator
  • Needs to be willing to conform to the formats, mechanisms and timings of information delivery as defined (through negotiations as necessary) by the Aggregator
  • Needs to be able to describe the information they supply in a “common language” that focuses upon “what” the information is, regardless of how or where it is represented

The Consumer

  • Needn’t have knowledge of the Supplier(s) as they work directly and exclusively with the Aggregator
  • Needs to be willing to conform to the formats, mechanisms and timings of information delivery as defined (through negotiations as necessary) by the Aggregator
  • Needs to be able to describe the information they require in a “common language” that focuses upon “what” the information is, regardless of how or through what mechanism it is delivered

The Aggregator

  • Is the “lynchpin” between Suppliers and Consumers, therefore is responsible for ensuring Consumer satisfaction through appropriate “sourcing” (supplier systems) to address the needs of all Consumers
  • As the central repository for the information transferred between suppliers and consumers, the Aggregator is keeper of the “common language” referred to in the Supplier and Consumer observations. This may take the form of a Master Information Catalog, a Semantic or Canonical Model, a Business Glossary of Terms or any combination thereof
  • Guides both Suppliers and Consumers through the defined interaction processes and the use of the standards and templates defined for aiding these interactions

Governance

  • Defines and ensures all parties adhere to the Rules, Rights and Processes for the use and management of the Data Lake
  • Identifies and defines all standards and templates needed to ensure the consistency, efficiency and effectiveness of the interactions
  • Governance is the ultimate and final authority for negotiating the relationships, duties, rights, obligations and privileges of all parties (Suppliers, Consumers and Aggregator)

As mentioned in a previous entry, these observations may sound dictatorial, but for this to be successful, when it comes to the information assets housed in the Data Lake, a highly collaborative environment where all parties are willing to compromise and reach consensus must be an integral part of the culture of the enterprise.

So, this completes my journey into Data Lakes and the Information Governance needed. I hope you found this interesting and helpful. Feel free to reach out with any comments or observations you may have. Thanks so much for reading my blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.