Text Mining With Voyant

Far from the relative DIY-ness of text mining with Python, programs like Voyant provide distant reading assistance with the comfort of a graphical user interface (GUI).

Voyant takes almost any kind of file. I chose two contrasting PDFs from my research: John Muir's Features of the Proposed Yosemite National Park from 1890, and Allen Fitzsimmon's 1969 MA thesis entitled The Effect of the Automobile on the Cultural Landscape of Yosemite Valley. I had them both in PDF format, which Voyant gladly accepted.

me

The full text of each article is visible on the right of the screen. The panel in the lower left corner lists word totals, word frequencies, vocabulary density, and distinctive words. Voyant creates a word cloud right off the bat, which is nice for instant satisfaction; however, it cries out for a list of stop words.

me

Once the stop words are in place, Voyant's functionality really opens up. Text mining works best if you've got preexisting hypotheses about the work(s) you're investigating. The time elapsed between my sources, I figured, would best reflect the changes in Yosemite's built environment from 1890 to 1969. I used Voyant's 'Word Trends' function to test this theory.

me
me
me
me

This comes as no surprise: Muir doesn't mention cars or roads. He also leaves out 'village,' an interesting clue regarding Native settlement in the area at that time; 'village' would eventually become synonymous with Camp Curry and other park concessioner lodging operations. Perhaps most interestingly, Muir doesn't use the word 'old.' Granted, the park didn't exist until 1890, but the landscape itself was obviously formed by ancient glacial events. This speaks to the park's newness in the late nineteenth century, as well as the overall conception of Yosemite as 'new' given the conservationist meanings ascribed to it.

After applying a list of stop words, the word cloud for each text became a lot more useful. Muir's cloud looked like this:

me

Not surprisingly, Muir's cloud features a lot of landscape language: 'valley,' 'canon,' 'meadows,' 'peak,' 'range,' 'mountain,' 'walls,' 'fall,' etc. There are also a lot of adjectives to go along with these objects: 'great,' 'small,' 'wild,' 'glorious,' and 'far.' A couple outliers, like 'cathedral' and 'sublime,' hint at the religious undertones that have made Muir's writing famous.

Fitzsimmon's cloud, on the other hand, looked like this:

me

Fitzsimmon's cloud represents the subsequent tourist-oriented development that transformed Muir's primordial paradise. Road-related words, like 'auto,' 'travel,' 'auxiliary,' 'primary,' 'network,' and 'parking.' The cloud also reflects the park's federal management through words like 'superintendent,' 'report,' and--somewhat obviously--'park' and 'service.' The built environment's changes is reflect in words like 'construction,' 'hotel,' 'accommodations,' 'camp,' 'facilities,' and 'village.'

From the cynic's point of view, these trends only confirm my suspicions: Muir wrote about Yosemite as an Eden, and Fitzsimmons addressed the role of the automobile in developing the park landscape for what Edward Abbey termed 'industrial tourism.' But don't these kinds of distant readings provide another important service? It's immensely helpful to have a plug-and-play program (with a GUI) to help pose new and interesting questions. While I can't imagine citing my Voyant voyages as definitive pieces of evidence, such forays will definitely assist me in choosing the right sources--and asking the right questions about them.