Seeing the forest: Voyant


I study contemporary Brazilian literature. My department is particularly wedded to the practice of close readings. Like from my cold, dead hands, Hestonion-level wedded to close reading.

maybe with slightly less rage face, but just slightly

I have spent literal hours pouring over small paragraphs. Marking them up. Underlining. Color coding them. Scouring for allusions and references to other works. It is not as though I did not expect to do these things as a student studying literature or that I do not think that close reading is a valuable part of examining a literary text, but many times it feels as though I’m missing the forest for all the trees.


Picard knows what’s up

This problem becomes especially apparent when you try and examine larger trends in literature over time. One of the focal points in my research is looking at the way that contemporary Afro-Brazilian writers depict slavery and the enslaved in their fiction, as part of a larger reflection on how members of an oppressed group reimagine traumatic experiences in order to point out connections between that trauma and present day conditions. The problems with just focusing on their work are myriad; Brazilian racial politics often muddle the distinction between who is black and who is non-black, there is a smaller corpora of Afro-Brazilian literature due to the legacy of racism in the country, and that academic snobbery and clan-ism purports that many of the works of fiction by Afro-Brazilian writers as non-canonical or non-literary. And if I manage to identify a trend in Afro-Brazilian writing on slavery, in order to give my findings a more concrete bearing I may have to isolate what exactly is so different about Afro-Brazilian writer’s depictions of slavery in contrast to non-Afro-Brazilians writing on the subject.

All of the books that I would like to represents a veritable forest of information. While I can read over them and get an idea of some of the overarching and consistent themes that they share (or don’t share), I still might miss connections. Or I could simply arrive at the text with foregone conclusions and miss important themes and trends because I never thought of exploring them.

That’s where the introduction of tools like Voyant and other text mining software is so interesting to me. I decided to do something fairly straight-forward; First, I took three texts written that I’m analyzing in my dissertation, 2009’s Um defeito de cor by Afro-Brazilian writer Ana Maria Gonçalves, Brazil’s first Afro-Brazilian female novelist Maria Firmina dos Reis’ Úrsula written in 1859, and abolitionist text A escrava Isaura written by Bernardo Guimarães, a white Brazilian, in 1875 and put them into Voyant to see what would happen.

I chose these novels because they have widely different aims and politics. (Um defeito de cor is historical fiction that tries to fill the dearth of Brazilian slave narratives by fictionalizing the harrowing story of Luís Mahin, leader of one of the largest slave revolts in Brazilian history and mother to famed Brazilian poet Luís de Gama. Úrsula is one of the first Brazilian novels that depicts the interior life of the enslaved and is sympathetic to its enslaved characters. And A escrava Isaura’s premise is essentially that slavery is awful because even the white-looking, perfect, angelic, beautiful Isaura can be enslaved. And because it makes white men into lustful and greedy individuals.

After I uploaded all three books into Voyant. I started looking at what were some very uninteresting results. Then I remembered that I needed to get a Portuguese stoplist. I had used a combination of AntConc, Excel, and Tableau when I was learning Tableau as a RITC to make some visualizations (they did not turn out to be very interesting) of something similar and ran into problems with words that for my purposes amounted to filler. After a short search, I found an adequate one from GitHub (stoplist)and then things got a little more interesting.

I found a couple of features most helpful. The feature that lets you know which words were most frequent over all the texts and which ones were distinct. The words that were distinct reflected things I already knew about the novels, but that gave some evidence of that. For example,  Um defeito de cor is a very long, epic novel in which protagonist is taken from Africa, lives in several different locations in Brazil, and eventually returns to Africa. There was a higher frequency of words for locations, words that distinguished nationality, and words associated with movement. It also has more African characters, so there was a higher frequency of words from African languages. A escrava Isaura spends a long time talking about the beauty of its protagonist so there was a higher frequency of named body parts in the novel than in the others.  Here’s the link for the Distinct Words graph.

Then I decided that I wanted to focus on what words were collocated with the words (escravo, escrava, escravos and escravas). Collocated Words Graph This was important because one of my research concerns is with gender and this would give me details about how each book was dealing with the enslaved in general and at the same time give me a peek into how gender was handled.

The results of that graph made sense. Úrsula‘s main characters are not enslaved, so the frequency of terms related to slavery was much less than in the other two novels. A escrava Isaura is about a female slave, so the high frequency of escrava in the novel makes sense. The results of Um defeito de cor was not exactly what I was expecting; I thought that there would be a drop-off in the use of escravo/a/os/as after a certain point because in the novel she escapes to freedom and lives in Africa. What I saw is that those terms are pretty consistent across the entire novel.

Overall, I felt like Voyant was helpful for helping me explore things that I recognize were happening in these novels, but might not be able to give credible enough evidence to support.  Voyant is like an expert witness that comes in to testify on your behalf at trial; you know you’re innocent, but you cannot prove it. Then in strolls Voyant confidently carrying carefully sealed bags of evidence, timestamps and video proof of your innocence.

Voyant in action.

It might even help you see things that you didn’t or couldn’t see before. I’d like to dump all of the novels that I’m using in my dissertation in it and see what it churns out. However, I think that one needs to be careful when drawing conclusions from it because of the use of stoplists and the danger of walking in with ready-made conclusions about what you’re going to find. Sadly,  it won’t do all the close reading for you, but it may help you see further than just the trees.





Week 8: Interrogating Data

Week 8: Interrogating Data

Both Week 7 and this week’s readings reinforce a necessary skepticism that we must have at taking data and analytics data sets at face value. Graphs and visualizations culled from quantitative data sets can give the appearance of objectivity while hiding nature of their constructedness. Reading through the articles for today, I was reminded of an article published by The Guardian last summer that detailed the way that an algorithm used in the Florida court system to prescribe race-blind sentencing was in fact biased against black men. First offenders “ respond to a Compas questionnaire and their answers are fed into the software to generate predictions of “risk of recidivism” and “risk of violent recidivism”. Sentences are then given out based on the algorithm’s recommendations.  The problem is that it is still giving black offenders longer sentences than white offenders for the same infractions. The even more egregious part of this is that the algorithm is owned by a private company that will not release information about the algorithms construction to outside researchers, so while a group of independent researchers were able to see that sentencing was still being doled out unfairly by the very program that is supposed to prevent it, there is no real way for an outsider to assess why and how the algorithm is doing so.

The two articles that we read for today that struck me the most were the two that reinforced the dangers of thinking about algorithms, graphs, data sets, etc. as unbiased. The Lev Manovich article interrogates the role that limited access to complete data sets (this limitation largely set by legal and capitalistic constraints), plays in the way that these data sets can be used to make larger points. The Johanna Drucker criticizes many of these constraints, but then shifts towards the prescriptive by suggesting ways that we can “humanitize” both our construction of our data sets and the ways that we choose to visualize them. She illustrates the ways that to think about visualization that reflects that destabilizes traditional categories and illustrates their “constructedness . They may not be as immediately “readable” as traditional graphs, but they offer a much more nuanced view of the data being presented.


My project uses many old categories that Drucker would ask us to interrogate. I am examining the shifting racial landscape in my neighborhood, View-Park Windsor Hills.  Within the last year, the Los Angeles Times published an article about the shifting demographics in my neighborhood. The neighborhood itself has been through several populations shifts.  Once a mostly white upper-middle-class neighborhood, the neighborhood became a “Black Beverly Hills” after the federal equal housing rights act in 1964, as black Los Angelenos were no longer barred from owning a home in the area. After that there was a period of white flight, leaving the black residents. It is the most affluent Black neighborhood in the United States, but due to rising housing prices in West Los Angeles, it’s proximity to easy freeway and rail line access, and beautiful historic home white Angelenos have raced to the area to buy homes, driving up home prices and leading to an extremely competitive housing market. Though this process in urban centers is usually referred to as gentrification, people analyzing my neighborhood don’t know if that is the proper term for a change in demographics in an already middle-class space. My goal is to examine if this is actually happening at the speed that the articles and anecdotal evidence. I’m using a data sets from Social Explorer and Google Fusion Tables. Obviously not finished, but will post it at a later date….