Towards Data Governance’s 4th Era – Part II

I finished my last article, Part I of “Towards Data Governance’s 4th Era”, with a question: What is the goal of data governance? I hypothesized that I might find a different path to answering this question by framing it in terms of my experience as a classical musician, and specifically how one of J.S. Bach’s last works, The Art of The Fugue, might be a revealing metaphor for applying a structure to data.

Since TDAN.com published my article in early February, I have had the pleasure of reading Laura Madsen’s thought-provoking new book, Disrupting Data GovernanceA Call to Action. As I’ve told Laura, her thoughts about the need for data governance to change is right in line with mine. What I like most about her book is the stress on what she calls “the radical democratization of data”. “It’s time to get the data out there”[1] and that means we must prioritize those aspects of governance promoting that.

This brought to my mind one of the data governance principles we developed in my prior organization, “Data must be understandable”. It hardly matters if a person has access to data if it is incomprehensible. That data governance provides structure is not in doubt, but the question remains for what? The idea that a musical structure such as a fugue can make a complex musical texture understandable (not to mention enjoyable) to a listener is what I thought would be worthwhile to explore.

When I first started to write, I began with what rapidly grew to be a lengthy discourse on the definition of several key musical terms which I felt needed to foreground the actual discussion of the fugue. I decided upon reflection that this is a sure way for me to lose my readers, so I am going to have us jump right into listening and understanding a fugue…after a few words of J.S. Bach and The Art of the Fugue.

Johann Sebastian Bach (1685-1750) was arguably one of the greatest composers of all time, and certainly amongst the most prolific, authoring over 1100 pieces of music.[2] He is, without doubt, the master of fugue. The list of works he wrote using the fugal structure include stand-alone fugues, fugues as part of or a section of a larger work, choral fugues, instrumental fugues, and fugues for every keyboard instrument—organ, piano, and harpsichord.[3] Composers wrote fugues hundreds of years before Bach and have continued to do so up to the current day, but no one has created such a vast and fantastically varied portfolio.

Of all these works, The Art of the Fugue stands alone. It consists of 16 fugues (referred to by Bach as “contrapunctus”) and 4 canons (more about canons below). All compositions were based on the same simple melody. Bach wrote this work during the last decade of his life, and the fact that the last fugue ends abruptly, whether because it was left unfinished or pages of manuscript were lost, has always added an air of mystery about the work. The Pulitzer Prize-winning book, Gödel, Escher, Bach: An Eternal Golden Braid is a seminaltext in the development of artificial intelligence[4] written by Douglas Hofstadter. Hofstadter writes of The Art of the Fugue reverently,[5] and it figures prominently in his book.

I am going to have you listen to the first “contrapunctus”. I’ve chosen a version performed by my favorite string quartet, the Emerson Quartet. A string quartet consists of two violinists, one violist (a viola is like a violin but somewhat larger and deeper in tone) and a cellist. As I mentioned, Bach didn’t specify the instrumentation for The Art of the Fugue, and you can find recordings by pianists, organists, harpsichordists, as well as full orchestras. I think the string quartet, with the differences in the sound of the instruments, provides the listener with great clarity in listening to each part. Think of this as having distinctive labels on each data set in a data lake.

But first, a tip to help you interpret the data, I mean, music! You will first hear one instrument, alone, play a simple melody. You will hear that melody multiple times, so listen for it carefully.

Here’s a link to the YouTube for this recording—it handily displays the sheet music as the quartet plays:

Please go ahead and listen to the first “contrapunctus”. It lasts only about three minutes. Don’t worry. I’ll wait.

Finished? I am guessing one of the first things you thought, once the first instrument (a violin, by the way) plays that melody I mentioned above, is that there is a lot going on! It seems that by the time all four instruments are playing, they are all playing something different. That is true, and our ears are not used to hearing this type of music.

Most of the music we typically hear today, whether it’s on our phones, radios, televisions, or movies, consists of a tune, or melody, with the other musical parts or voices supporting that melody. That support might be in the form of a few simple chords strummed on a guitar, or it might be a piano playing many notes. Whatever the support, or accompaniment, when you hear music like this, there’s no doubt what the most important part or voice is—it’s the one with the melody. This style of music is homophonic.

It struck me as I wrote about homophonic music that the way we viewed data up until recently was similar. At any point in time, we’d focus on one piece of data, whether it’s a number in a report, a model output, or a “Yes” or “No” on which a process depends. With the advent of Big Data, this has changed, in part because of the sheer volume of data being combined, as well as the desire to find correlations and patterns requiring a wide-angle focus.

The music you just listened to is polyphonic. Polyphony, as defined by Merriam-Webster, is a style of musical composition employing two or more simultaneous, but relatively independent melodic lines.[6] Polyphony is nothing new—in fact, it was the style in which Middle Ages composers wrote. It continued to be the style of choice right through the Renaissance and Baroque periods. It was only in the Classical period, beginning around 1730, when homophonic music rose in popularity among composers.

Today’s data, in its near-infinite variety of sources and types, resembles polyphonic music more so than homophonic. But combining independent melodic lines is just as difficult as understanding disparate data sets. Without some guidelines, the result is cacophony, when sounds combine with no rhyme or reason, and the ear hears nothing but harshly clashing noise. In the data world, we have the same situation when data lakes become data swamps, filled with incomprehensive information not fit for any use. This occurs due to the lack of data governance. Paul Brunet describes this well in an article published in Infoworld: “A data lake without data governance will ultimately end up being a collection of disconnected data pools or information silos—just all in one place.”[7]

As you might imagine, writing polyphonic music without resulting in a cacophony requires that the composer follow a framework, consisting of certain rules and techniques. This is counterpoint, and it addresses both the vertical relationships (notes of each melody sounded together),  as well as the horizontal independence of each melody. We often think of complex data sets, and combinations of data sets, as multi-dimensional, and we establish frameworks to help us understand and manage this complexity.

Let’s return to our fugue. One of the three defining principles of a fugue is that it must be polyphonic, or, more precisely, contrapuntal. The contrapuntal technique uses two or more independent melodies woven together to avoid cacophony. There are lots of musical types that would meet this principle. Let’s find out what makes a fugue unique.

Listen to the first fugue again and focus on how many times you hear the melody that the first violin plays by itself in the first 10 seconds. Listen to how different instruments play it at various times. This is the theme or subject of the fugue. I’ll use the term “subject” going forward. Please listen to the music again:

The fact that there is a definite subject stated at the beginning of a fugue is itself one of the three defining principles, described by Roger Bullivant in his book, “Fugue”.[8] Often a single instrument or vocal part states the subject. In other cases, one or more voices or parts may accompany the subject, but in any case, the composer makes it very clear to the listener that this is the melody to listen for, no matter how complicated the multiple voices get. The subject is the buoy the composer leaves for the listener to navigate the melodic eddies of the fugue.

What is the corollary to a subject from a data governance standpoint? What enables a user to find useful information and create order out of chaos? Metadata and lineage are two “navigational” tools that provide guideposts to the analyst. We can think of the subject as musical metadata—it’s recognizable no matter how much is going on around it.

You will have also noticed that the subject repeats sequentially, and that it’s taken up by a different instrument, or voice, higher or lower than the previous repetition. This is the musical technique of imitation. The fact that other parts take up the subject using imitation is the 2nd principle characteristic of a fugue.

What does each instrument do after playing the subject? Notice that it doesn’t stop playing or fade into the background with a long-held note. For example, the violinist who plays the solo statement of the subject continues with a melody that is quite different from the original subject, now played by the 2nd violin. And yet they sound good together—the magic of counterpoint! This is the 3rd principle of the fugue, which Bullivant describes as a “contrapuntal texture to the extent the theme may appear in an upper, a middle, and a lower part.”[9]

By the way, if you continue to listen to the 2nd violin, you’ll hear that once it finishes the subject, it goes on to play something different from the 1st violin’s continuance—in fact, the 1st and 2nd violins seem to be playing their own duet distinct from the subject.[10] I think of Bach here being a sort of musical data scientist—he finds new relationships between fragments of melody and generates musical value, just as a data scientist seeks fresh relationships between data to create business value.

To summarize, here are the three principles that create the “fugue governance framework”.

  • There is a definite subject stated at the beginning.
  • Other parts take up the subject using imitation.
  • There is a contrapuntal texture with the subject appearing in an upper, a middle, and/or a lower part

This is a strikingly simple framework given the complexity of the music we have just heard and reminds me of a phrase coined by Daniel Funk, Senior Manager of Data Services at Nutrien: “Minimum Viable Governance”, or “MVG”. Our data governance frameworks, which often include a multiplicity of principles, standards, rules, policies, et al, could benefit from thinking about the MVG and what the framework of the fugue we’ve just discussed represents. It is especially so when we consider the huge variety of beautiful music it has engendered.

Now, in a typical data governance framework, principles provide the high-level guidance, standards, policies, and specific rules. Fugues have similar rules for the 2nd statement of the subject, commonly called the “answer”. These rules exist because the answer begins on a different musical note than the first statement of the subject. In Bach’s time, this typically would be the 5th note of the musical key of the fugue. If the fugue is in the key of C (the key of the white keys on the piano keyboard), then the answer normally starts on G, 5 white keys up (or 4 white keys down) from C.[11]

The rules for the answer provide guidance to the composer on how to deal with this scenario. These rules are a bit technical (just as the policies and standards relating to metadata, lineage, and data quality are), but the key point is that, by following them, a composer can adjust the answer, changing a note here or there as long as the listener can still identify that this is a repetition of the subject. You can hear Bach do this in the first Fugue. Listen carefully and you will note that the distance between the first two notes in the answer is smaller than in the subject.

The answer “standards”, if you will, don’t exist to confine the composer’s imagination, but rather to provide guidance and assure the subject remains recognizable to the listener even if it melodically needs a bit of adjustment. Similarly, detailed standards for lineage, etc. can help data consumers use data more efficiently.

If you now go on to listen to additional fugues (and the canons too), you will see how Bach takes the same basic material, the original theme, sometimes turned upside down, sometimes with different rhythms, and creates something unique in each of the 20 sections. For all the fugues, he works within the same framework, and the result is almost endless innovation.

This is what attracted me to the idea of the fugue in the context of data governance in the first place—how a musical structure can stimulate creativity on the part of composers (essentially data creators or producers) while enhancing comprehensibility for listeners (users, analysts). If the goal of the “fugue governance framework” is to provide a structure for creativity and clarity, might the goal of data governance be, similarly, to provide a structure for innovation?

Actually, “invention” might be a better word in this context. The Merriam Webster definition includes “productive imagination: inventiveness”, as well as “discovery, finding”.[12] Bach himself wrote 2- and 3-part “inventions”, contrapuntal pieces including fugues. We could imagine the purpose of the rules, the guiding principles of fugue is to provide a structure for invention—for productive imagination, and, most relevant to our data context, discovery and finding.

Could we think of data governance as having a similar goal, to provide a structure for productive imagination, for discovery, for finding…in the context of data?  This may sound like a strange goal indeed to my fellow data governance practitioners. For years, we’ve concentrated on meeting regulatory requirements and making sure data is secure. I’d submit that control has been our number one focus. But as I wrote in my previous article, and as Laura Madsen states so well in Disrupting Data Governance,[13] the focus of managers and analysts alike when it comes to data has changed from how to control its use to how best exploit it. Additionally, more business senior executives are recognizing the importance of data as a business asset, as a driver for innovative analytics and critical decisions.

The goal of data governance—to provide the structure for invention. It sounds poetic, but is it practical? In my next article, I will show how adopting this goal can drive actions and measurable business value.


[1] Madsen, Disrupting Data Governance (Technics Publications, 2019), pg. 27.

[2] https://www.classicfm.com/composers/bach/guides/bach-facts/

[3] https://en.wikipedia.org/wiki/List_of_fugal_works_by_Johann_Sebastian_Bach

[4] https://mindmatters.ai/2019/07/we-went-back-to-visit-godel-escher-and-bach/

[5] “In the Art of the Fugue, Bach uses a very simple theme in the most complex possible ways. The whole work is in a single key. Most of the fugues have four voices, and they gradually increase in complexity and depth of expression. Toward the end, they soar to such heights of intricacy that one suspects he can no longer maintain them. Yet he does . . . until the last Contrapunctus.” Pg. 94

[6] https://www.merriam-webster.com/dictionary/polyphony

[7] Brunet, https://www.infoworld.com/article/3290433/data-lakes-just-a-swamp-without-data-governance-and-catalog.html

[8] Roger Bullivant, Fugue (Hutchinson & Co LTD, 1971), pp. 17-19.

[9] Bullivant, Fugue (Hutchinson & Co LTD, 1971), pg. 19.

[10] This is where a fugue differs from a canon, where each part is exactly the same, from beginning to end.

[11] This is different from the typical canon, where each part starts on the same note – an exact imitation

[12] https://www.merriam-webster.com/dictionary/invention

[13] Madsen, Disrupting Data Governance (Technics Publications, 2019)

Share this post

Randall Gordon

Randall Gordon

Randall (Randy) Gordon has worked in the financial industry for over twenty years and has spent the last decade in data governance leadership roles. He is passionate about data governance because he believes reliable, trusted data is the foundation of strong decision making, advanced analytics, and innovation. Randy currently is Head of Data Governance at Cross River Bank. Previous employers include Citi, Moody’s, Bank of America, and Merrill Lynch. Randy holds an MS in Management – Financial Services from Rensselaer Polytechnic Institute, and a Bachelor of Music degree from Hartt School of Music, University of Hartford, where he majored in cello performance. In addition to being a columnist for TDAN.com, Randy frequently speaks at industry conferences. The views expressed in Through the Looking Glass are Randy’s own and not those of Cross River Bank.

scroll to top