Data Visualization for The Unknown and the Unknowable

Confession time. I hope you’ll forgive me as I make my way to a larger point. Here goes: I love Crossfit. Yes, the fitness cult that you’ve heard too much about from your Facebook friends — the ones who spout off about WODs, AMRAPs, burpees, and Keto diets.

Crossfit calls itself the “Sport of Fitness” with a “constantly varied, functional movement, executed at high intensity” which prepares you for the “unknown and unknowable.”

I like it as a daily challenge that takes me out of my comfort zone. It also holds parallels to the challenges faced in data visualization and data storytelling when you face the unknown and unknowable in your data.

Telling your data story can be quite a bit simpler when working with a known data set. The range of values for each metric are well understood. The diversity of values in your dimensions are known. Whether you’re a data journalist or creating a PowerPoint presentation, it is a small luxury to know exactly the data you are working with and therefore how to best craft the data visualizations.

Things get trickier when the data is “constantly varied” — whether that is across time, across customers, or across organizations. The behaviors captured in the data can surprise you. Suddenly there are unexpected null values. Or a user has entered an interminably-long text value where a word or two is expected. (We once had a client who created a sample data set filled with a surprisingly-diverse collection of inappropriate curse words as a sort of test of our ability to handle the unexpected).

How does your dashboard or data story hold up in its ability to clearly communicate when the data gets super big? What does it look like when there are few or no values?

We’ve learned the hard lesson many times that a beautiful mock-up of an interface (in which we imagine perfect and simple data) breaks down when faced with reality. It is like Mike Tyson said: “Everyone has a plan 'till they get punched in the mouth.” The data will punch you in the mouth.

This is the challenge of dynamic data storytelling and visualization. You need to train for the unknown. For example, our ‘ranked list’ visualization includes a navigational element on the left to let you scroll through values when there are ten or more.

Juicebox ranked list

Juicebox ranked list

A different visualization method might end up looking like this:

The unknown and unknowable in data is often where we find the unexpected insights or how our data systems are failing us. And this is the strange joy of Crossfit — the everyday revealing of your unexpected capabilities and your fitness weaknesses.

If you’d prefer to avoid any more talk of Crossfit, Natalia Kiseleva brilliantly captures the differences between the interactive and status data visualizations using fish.

Natalia Kiseleva @eolay13 https://twitter.com/eolay13/status/1196086600012304384

Natalia Kiseleva @eolay13 https://twitter.com/eolay13/status/1196086600012304384