Science is what we understand well enough to explain to a computer.
Art is everything else we do--Donald Knuth (American scientist)
My childhood friend Amy Herman teaches visual intelligence to vital and important people. Think Department of Defense Special Operations, FBI, Interpol, and Scotland Yard to name a few.
Participants analyze works of art to reconsider their skills of critical inquiry and articulation, thereby improving their individual and collective abilities to discern the distinctions between perception and inference. As a result, they are able to enhance the effectiveness of their work.
I immediately saw a direct application to data literacy and teaching healthcare stakeholders how to create and consume data.
Let’s look at a beautiful painting Amy shared on social media. It looks characteristically Van Gogh with the colors and the light—done and done?
Cafe Le Soir (1888) Van Gogh
A little research on the other hand reveals that the painting was never signed by the artist. They verified the identity of the artist based on conversations and references he made to the work. But the detail I want to highlight is a theory that this painting is indeed Van Gogh’s interpretation of DaVinci’s the Last Supper.
Unless you are willing to dig—you might miss context or unique historical references that may or may not influence your perception.
For example, there are 12 diners—even one sketchy Judas donned in black ducking into a doorway. Take a look at the waiter. Did you notice the image of a cross perhaps behind him? The long hair and robe? Pretty intriguing wouldn’t you say?
All of this would be missed without a viewer’s lived experience noting the similarity and doing a bit of exploration. This is an example of how no matter how you categorize humans (gender, race, or ethnicity) none of us are influenced the same way by our environments or experiences.
I tend to introduce art to audiences before actual data. We can simply observe or interrogate what we are seeing. Are there any pre-attentive attributes being empowered to distract our attention? As data visualization professionals, the tools of pre-attentive attributes guide us to use color, form, and spatial positioning to help direct the eye of the viewer. I ask you to keep this in mind when looking at graphics. Where am I being instructed to look vs. what else might inform my insights.
Racism in data is an example of using color to distract or distort our perception. Many of the algorithms we use to make decisions about recidivism, home ownership, or college acceptance (to name only a few) are problematic.
The origin of the prejudice is not necessarily embedded in the algorithm itself: Rather, it is in the models used to process massive amounts of available data and the adaptive nature of the algorithm. As an adaptive algorithm is used, it can learn societal biases it observes—Can computers be racist? Big data, inequality, and discrimination
The immediately accessible problem with algorithms—not to trivialize the systemic racism contributing to our unavoidable biases and heuristics—is the black box or lack of transparency. Knowing that there is a problem but not being able to identify the exact cause really can’t lead to workable solutions. Telling people, maybe even specifically white people—to stop being racist—is also not the solution. We aren’t aware of many of our cognitive biases and heuristics.
One of the first steps is to understand the lexicon. What do algorithms do? If you are familiar with criminal justice black box models—we don’t know. The interactions and predictions are embedded inside the algorithm. Meaning that the box is opaque because the functions are too complicated for human comprehension or they are proprietary (more often the case in my world).
In machine learning, these black box models are created directly from data by an algorithm, meaning that humans, even those who design them, cannot understand how variables are being combined to make predictions. Even if one has a list of the input variables, black box predictive models can be such complicated functions of the variables that no human can understand how the variables are jointly related to each other to reach a final prediction.—Cynthia Rubin and Joanna Radin
Interpretable models could potentially be used instead and the debate is beyond the scope of this post but dig deeper here, Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.
If you would like to interact with an actual AI algorithm read another great article by Cynthia Rudin (also Joanna Radin). Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From An Explainable AI Competition.
Here is a link to the Global Model visualized below.
These issues are timely as data centers become decentralized from their once central role as relational databases and we see distributed computer systems handling sensitive healthcare data. Distributed algorithms are exploring patient data, reimbursement claims, pharmacy visits, and an expanding number of attributes. Healthcare delivery is multi-faceted and complex. Telemedicine is one recent example of distributed care.
In predicting medical outcomes, the machine might pick up on information within doctors’ notes that reveal the patients’ outcome before it is officially recorded and hence erroneously claim these as successful predictions.
I will leave you with a final thought. Algorithms are complex. When you don’t understand the fundamental issue or potential sources of misinformation—you become part of the problem.