Painting by the Numbers: Data Visualization
Persuasive and engaging, digital visualizations are opening up new frontiers of understanding and sharing information, as well as creating new risks.
Charles Hornbaker, a first-year student at the Harvard Business School, admits historical corn yields are not the most interesting subject in the world. But as part of a design team with fellow graduate students J. Benjamin Cook, Conor L. Myhrvold, and Ryan King at the Harvard School of Engineering and Applied Sciences, he's helped to realize the improbable: making corn interesting through a colorful and captivating web-based visualization called "Century of Corn.” In the design, green shapes spill across a map of the United States, covering the Midwest in a verdant spectrum, their hues deepening with time. "Century of Corn" was the winner of last year's design competition in Computer Science 171: “Visualization.”
The rising popularity of CS 171 and the high quality of its final projects speaks to a growing interest in digital visualization at Harvard and in the world. Part artform, part analytical tool, digital visualizations occupy a unique niche in communication. They are adding a visual component to conversations in fields ranging from hip hop to scientific collaboration. Powerfully persuasive and engaging, researchers and practitioners say that digital visualizations are opening up new frontiers of understanding and sharing information, but balancing aesthetics with contents can create risks.
"The big difference was that suddenly we had interactive visualization. Until 2007 what we saw was usually static…. Ever since then, with libraries such as D3 or Canvas or WebGL, you can do pretty cool stuff that works reliably on many platforms," says Alexander Lex, a postdoctoral visualization researcher at SEAS.
LEARNING THE ROPES
Although the works produced by student designers in CS 171 are highly complex, the class begins its study of design with a very traditional medium: sketching. Students attend a special lab in which, with the help of a guest designer, they learn the basics of shape and layout. Alexander Lex, a postdoctoral visualization researcher at SEAS who has co-developed CS 171 in the last two years and will be teaching it this spring, says “It's hard to teach the aesthetics.” He says the course encourages students to refine their designs based on feedback from others. “It really starts with drawing lines and circles on a sheet. Then we go on to sketching user interfaces.”
Lex says he has seen interest in the course rise over the past two years and in related courses CS 109: “Data Science” and EMR 19: “The Art of Numbers,” indicating increased demand from students for instruction in data science and visualization. Attention-grabbing, web-based data visualizations have been all the rage in 2014, taking social media by storm. Facebook made a splash this year with a map showing NFL affiliation by county in the US. The New York Times’s growing data blog, The Upshot, is frequently in the most read section for its interactive visuals on political and economic trends. And Matt Daniels's visual index of the linguistic diversity of hip hop artists’ lyrics attracted social media buzz because of its creativity and layout.
According to Lex, part of the reason web-based visuals are growing in popularity is that more designers have learned the programming skills to make sophisticated interactive visuals. “There was a big revolution that started essentially with the development of HTML 5,” Lex says. “The big difference was that suddenly we had interactive visualization. Until 2007 what we saw was usually static...ever since then, with libraries such as D3 or Canvas or WebGL, you can do pretty cool stuff that works reliably on many platforms.”
Perhaps more importantly, data visualizations have a unique power to tell a compelling story by focusing attention on a particular trend. Many argue that explaining that same trend verbally can be more challenging and less persuasive. Jared Knowles, a research analyst at the Wisconsin Department of Public Instruction, says he has seen this firsthand. One of his early projects was a map-based animation that showed the proportion of students eligible for free and reduced lunch—an identifier of low income—changing in school districts from 2000 to 2012. “I think this visualization makes people have an emotional connection to the data, whereas if they just saw it on a line chart, they wouldn't connect to it the same way,” Knowles says. In addition to his work in Wisconsin, Knowles helps teach others about data visualization and statistical techniques, including giving a speech at the Strategic Data Project, a group partnered with the Harvard Center for Education Policy Research and dedicated to bringing statistical analysis to the education system.
Olivier H. Beauchesne, a data scientist and programmer who specializes in visualization, agrees that the best visuals are those in which people can situate their own experiences. “When you can have insights or even better, insights about your life and your community, this is what I think make a really good data visualization,” he says. “The first time you use Google Earth, you look at where you were born or where you are at the moment. Even though you have access to the whole world, you're going to look at only specific places where you have a history.”
A strong narrative is all the more important given the potentially overwhelming amount of data now available to visualize, Beauchesne says. “Only 20 years ago you would have to go to the library [to get your data]. Now you have repositories with all kinds of data, and you can even buy data from data brokers.”
But the new capabilities do come with costs. “Not all the data is good data and some data should remain private but is not,” says Beauchesne. “It comes with a bunch of caveats.” Often it is up to the designer to decide what information is important enough to include—and what is not.
"Human beings are particularly susceptible to cognitive loops like apophenia, which is seeing patterns where no meaningful pattern exists. In the absence of pattern, we make pattern," says Matthew Battles, a senior researcher at metaLAB.
FORM FOLLOWS FUNCTION
One such challenge is striking a balance between aesthetics and detailed information. An effective visualization is not only visually pleasing; it also need to convey important information accurately. “Sometimes we have to remove some information to make it aesthetically pleasing—not that removing information is a problem, but we have to say that we removed it,” Beauchesne says.
Beauchesne’s works include a striking network map showing co-authorship ties across different scientific disciplines. He acknowledges that deciding what constitutes appropriate simplification is an important part of his work. “We had to remove a small amount of papers and collaborations,” he says. “By removing some of the lines, we are able to see the bigger picture.”
For Beauchesne, function dictates form. “How you deal with that balance [between aesthetics and content] depends on the goal. If it's more of an artwork, then you have to take the aesthetic considerations as your priority,” he says. Lex agrees with Beauchesne. “It's very similar to product design; function comes first and the design follows,” Lex says.
At the extreme end of data manipulation, visualizations can be used to deceive—in part because a designer can change or manufacture data without giving viewers an easy way to verify the work. “We see dishonest visualizations all the time,” Lex says, but he adds he is not not too worried. “If somebody wants to get a message across, they will do it with visualizations or they can do it with written argument.”
Lex and Beauchesne agree that an informal vetting process generally checks against outright fraud. “I think there’s a kind of peer review that is not really official,” Beauchesne says, explaining that much like in the early days of the blogosphere, discerning viewers will skip over poorly executed content and share that which they admire. “If it has good information, it's going to rise to the top,” he says.
For Lex, the worry is not so much willful misrepresentation as unwitting bias. Lex mainly develops visualizations systems for experts. “I always tell them, 'This is a tool that should give you an idea about what is in your data but you still have to run a confirmatory analysis afterwards,’” Lex says. Visual analyses can sometimes be so persuasive that they can fool researchers into accepting a false conclusion if that conclusion not tested statistically or in a lab setting.
“All data are cooked, but we tend to talk about it like it's raw,” says Matthew Battles, a senior researcher at metaLAB—a research unit at Harvard that explores networked culture in the Arts and Humanities. He says visual images, unlike musical compositions or written works, are taken in by the audience all at once, making them especially likely to mislead.
“The power of the visual is also its weakness,” he says. “Human beings are particularly susceptible to cognitive loops like apophenia, which is seeing patterns where no meaningful pattern exists. In the absence of pattern, we make pattern.” Those cognitive leaps can be especially pronounced in visual works. For that reason, it's not enough to simply present a visualization as the height of truth and insight. It needs to be embedded in a larger narrative, Battles says, and in dialogue with other claims and counter arguments.
A SECOND LOOK
For the team at metaLAB, examining how visualizations are constructed is as important as their aesthetics and content. According to Battles, “What do the ways that we visualize data have to tell us about the ways in which we understand the world?” is one of the main questions metaLAB asks.
“There’s an interesting culture of data that has emerged in recent years in which we treat data as real things in the world, as just the facts, as evidence of things in the world that are beyond dispute,” Battles says. In reality, all visualizations are made using a series of data collection and design choices, rather than being absolute truths. For Battles, there's reason to think more deeply about data visualization—a practice that long predates the digital era.
Battles sees Google Earth as an example of how visual images are constructed. Referring to typical users of Google Earth, he says, “It's a common surprising misconception that when [they] look at Google Earth, they think they're looking at a real-time representation of the globe, a snapshot of what's happening on the globe, even if it's not at the moment at which you're looking at it. In fact, of course, the map that's presented to us is this kind of crazy quilt of images that are derived from different sources at different times.” According to Battles, the misaligned images on Google Earth—such as a snowy landscape turning into summer—are but the norm in the field of visualization.
“We see these things, we tend to think of them as anomalies, but really they're expressive of the kind of fundamentally fabricated nature of that set of data and its visualization,” he says. “That's not to deny that Google Earth is a powerful tool. It lets us make arguments about and imagine the world in ways that we couldn't before, but it's helpful to understand that there is somebody behind the curtain making that happen.”
For Battles and his collaborators, it is not enough to take visual representations at face value. Those images must also be critically analyzed based on the system of beliefs about the world that underlie their design and the data creation processes that led to their content.
"You have to find other ways of reaching the public. One more graph about climate change is not going to convince people," says Nathalie Miebach, a visual artist.
BRIDGING TWO FIELDS
Testing the connections between scientific and artistic images is another topic of interest inspired by data visualization. Nathalie Miebach, who attended Harvard Extension School in 2003, is a visual artist whose work bridges the gap between traditional art and scientific data visualization.
Her work includes musical scores, installation pieces, and sculptures based on data, including “The Sandy Rides,” a work built using weather station data that represents amusement park rides demolished by Hurricane Sandy.
Miebach got her start in an Extension School astronomy class. A self-described tactile learner, she made a sculptural final project for the course based on the Hertzsprung-Russell diagram, a well-known scatterplot showing the relationship between star luminosity and temperature.
“At the time I was also taking a basket weaving class with Lois Russell,” Miebach says, “And when it came to writing a final paper for the astronomy class I decided, 'I'm going to make a sculpture, because that's really the only way I'm going to be able to understand it.’”
Miebach’s final project was a representation of the Hertzsprung-Russell diagram made using traditional basket-weaving techniques. She varied her weaving method based on the information in the diagram, letting its values dictate her weaving technique. “It was like sculpture by numbers,” she says.
She chose to work with basket reed, a traditional basket-weaving material that is also difficult to control. “In order for the reed not to break, I have to respect the tensions being acted on the reed. So in the end, it's not me creating the form, it's the numbers creating the form,” she says. It was the numbers in the diagram that ultimately determined the warping shape of the product. Although her works begin with a foundation of data, often from very sophisticated scientific sources, she uses that information to create an aesthetic experience that can stand on its own.
Though her work has been shown in gallery spaces, Miebach says, “I still want you to be able to read the information off of each piece, and I want the piece to exist in the science and the art realm.” For her, the pieces ultimately don't belong fully to the world of sculpture nor that of art. Referring to “The Sandy Rides,” which raises questions about the consequences of global warming, Miebach says, “You have to find other ways of reaching the public. One more graph about climate change is not going to convince people.”
ENVISIONING THE FUTURE
The need to visualize information is a commonality among many different scientific endeavors, including the study of campaign and election data, public health and policy information, and the Quantified Self movement, in which individuals gather data on the details of their daily life through a combination of wearable sensors and self-monitoring. “It's interesting that we live in a time when all of these different fields can be united through the lens of the visualization of data,” Battles says. “I think we need to ask questions about how easily translated these fields are from one to the next and whether our focus on data is sufficient to catch the distinctions across many different fields of human experience.”
Visualization has spread in part due to technological developments, which have made the traditional practice more dynamic, accessible, and powerful. More interactive and user-friendly graphs are likely to be the future of this field.
Lex predicts that programs that allow users to create their own visualizations as easily as they can currently make a static bar graph will be the next revolution. Although a baseline of programming knowledge is currently needed to create a sophisticated interactive visualization, a software for users without these skills would give many more people the capacity to develop interactive visualizations—and likely lead to a broadening of the topics being communicated through dynamic visuals. Despite—or because of the misinterpretations and distortions—it’s clear that one thing will remain constant: the power of the visual design to captivate, inspire, and generate conversation.
—Staff writer Ola Topczewska can be reached at firstname.lastname@example.org.