Launching the Armed Services Editions

I am happy to announce the launch of my CHI Project, The Armed Services Editions: A Computational Analysis. On my page, users can navigate through three “Data Narratives”: simple analyses that I conducted to answer critical questions about these data. The Gender Data Narrative considers the distribution of gendered pronoun usage throughout the corpus, and features a basic foray into LDA topic modeling. The Genre Data Narrative considers the types of books that were sent to servicemen, and how the generic representation of books may have shifted over time. Finally, the Geography Data Narrative the geographic imagination of the corpus– both domestic and internationally– with NER.

This first phase of this project is, quite simply, a book history project. To date, the ASE Corpus has not been studied in total. Several scholars have published institutional histories of the Council on Books in Wartime, or discussed the role of specific books, or even discussed the ASEs in relation to a larger sociological project. I am interested in assembling a more thorough, stylistic, macro-history of the ASEs, that attends to both it sociological import as well as its formal properties through computational analysis. The data I’ve assembled is descriptive, working toward that end, and is a necessary foundation to the more advanced analysis I will be conducting this summer.

In addition to an analysis of the ASE Corpus, this website is also a record that chronicles the development of my methodological chops. While I had a basic foundation in R (thanks to a fabulous course at HILT), my skills needed (and still need) development. I used two textbooks to improve my skills, testing my dataset throughout. Users familiar with Text Analysis with R for Students of Literature by Matt Jockers and Humanities Data in R by Lauren Tilton and Taylor Arnold will likely be able to trace my data analysis back to the chapter problem sets.

Full disclosure: I feel insecure about this. I would like, eventually, to publish on the ASEs. A record of my fledgling explorations in R and data analysis is… well, nerve-wracking. Yet, as Ethan Wattrall has reminded me in a variety of ways, it’s also an important intervention. Over and over again this year, I have been reminded of and impressed by the generosity of my colleagues in DH; I post this basic data analysis in hopes of inviting that same generous conversation.

Only a fraction of the work that was completed on this project his featured on my project website. I should have foreseen this problem and created a time-lapse video of my hours and hours running OCR on hundreds of documents, or adding metadata to my database. Or, better yet, learning how to analyze data in R. For this project, however, I decided to visualize my data using Tableau. Tableau provides far less specificity, for sure, but it also allows for a greater degree of user interactivity. Since my data is, at this stage, largely descriptive, I wanted users to be able to explore with greater flexibility.

It’s been a long year working on this project, and that long year has turned out to be just the beginning. I’m so excited to see how this project continues to develop. Over the summer, I’ll be continuing this project by running these analyses—and much more interesting, advanced analysis (fingers crossed)—on the entirety of my corpus.

The questions motivating this project are increasingly pressing, and continue to motivate me—particularly as a powerful political candidate has remained consistently hostile toward the free exchange of ideas that should define any democratic discourse. Ultimately, this project asks, what (or whose) ideas are acceptable, and what (or whose) ideas aren’t? And what (and who) makes that so? These questions should be asked about 1940, and they should be asked about 2016.

Politics and Form: The Armed Services Editions

As a CHI Fellow, I’m undertaking a large-scale text analysis of the Armed Services Editions, a collection of novels sent to US Soldiers during WWII to “fight the war on ideas,” to consider issues of politics and literary form. I first stumbled on the Armed Services Editions a few years ago, while researching Ernest Hemingway’s The Sun Also Rises. You may recall Jake’s description of Robert Cohn, early in the novel:

He had been reading W.H. Hudson. That sounds like an innocent occupation, but Cohn and read and reread “The Purple Land.” “The Purple Land” is a very sinister book if read too late in life…For a man to take it at thirty-four as a guide-book to what life holds is about as safe as it would be for a man of the same age to enter Wall Street direct from a French convent, equipped with a set of the more practical Alger books.

I was working on a project on modernist reading networks, and this passage jumped out at me. I looked into The Purple Land and found that it was chosen to be a part of the Armed Services Editions in World War II, 16 years after the publication of The Sun Also RisesCursory research into the Armed Services Editions led me to the Council on Books in Wartime, a committee of publishers that assembled during World War II and contracted with the US Military to produce cheap paperback editions for US soldiers abroad. The goal (and slogan) of the Council on Books in Wartime was to use books as “weapons in the war of ideas.” Books had an important role to play in the war effort, the CBW wrote, because “Books can help us recover our past and teach us what a tough-fibered people we can be when we have to. Books can tell us what our enemies are like. Even prizefighters study their opponents carefully.[…]Books can tell us what our allies are like.” All of this was vitally important to such a “total war.”

Yet, the process for selecting these books for such an important task was fairly opaque. According to a booklet commemorating the ASEs found in the Princeton University Mudd Manuscript Library,

“Titles are selected by the following process: Publishers’ lists are combed and copies of books thought desirable are asked for. Each book is then carefully read by a professional editor who makes out a written report. The books and the reports are submitted every two weeks to an Advisory committee consisting of publishers, librarians, booksellers, critics, and authors. Books that meet with the approval of this Advisory Committee are then sent to the Army and Navy, both of which services must agree on a title before it is accepted for publication.”

Presumably, a desirable book would be selected and approved because of its fit within the general aims of the ASEs: to boost morale, to promote democracy, to learn about the enemy. Histories of the ASEs show very little censorship of books (though, presumably, certain books would not have been “thought desirable” and suggested for publication in the first place—James Joyce didn’t make the cut, nor did DH Lawrence). A quick scan of the ASE database reveals some books that make sense as “desirable” in the promotion of democracy for the war on ideas (in the hive-mind of the DoD in 1943): Jack London novels, for instance. Others seem out of place, such as Virginia Woolf’s The Waves. Yet, over 120 million copies of 1,322 books were distributed on the front lines and in military hospitals, all of which met the criteria outlined by the CBW: they each helped to “fight the war of ideas.”

I’ll be looking at this corpus for my CHI project, analyzing what it would mean for a text to be made into a weapon for democracy.

Big picture: how might an understanding of the CBW Corpus help us think about textual politics, politics and style, politics and form? To answer this question, I want to consider how “democracy” might be operationalized and measured—in other words, what formal or stylistic measures might make a text “democratic”? I have other plans for this project down the road, including developing a predictive model. But for the purposes of my CHI Project, I’m going to be building this corpus and conducing some preliminary analysis in R. Right now, I’m eyes-deep in Phase One: Building the Corpus.

Fortunately, it is quite easy to find a full list of all of the ASEs. Also fortunately, many of the titles assembled by the CBW were written prior to 1923—that is, public domain. It is unlikely that I will be able to assemble a corpus of all 1,300 titles. I plan to do the following:

  • Follow the release of the ASEs chronologically, starting with the A series and moving through ZZ.
  • Keep texts that I can find already digitized in the public domain (Hathi Trust, Project Gutenberg, Google Books)
  • Keep a running list of texts that
    • not digitized but ARE public domain
    • still protected under copyright
  • See what I end up with and make some hard choices about scaling, about digitization, and about copyright and fair use.

Highly scientific and conclusive, I know. I’ll cross the OCR bridge when I get there.

There are some texts that I know already that I can discard. The ASEs assembled some “made texts,” short story collections by famous authors like Ernest Hemingway (his novels were excluded). There will certainly be more difficult choices to make about inclusion/exclusion. For instance, some texts were abridged to fit the specific production dimensions of ASEs, such asMoby Dick. In these cases, I’ll have to decide if I want to take the full-length version or discard it entirely.

And I’ll also have to think critically about the sort of metadata I hope to assemble in the process. Author gender might be interesting (if infuriating). I was surprised to find that the most popular ASE was Betty Smith’s A Tree Grows in Brooklyn. Perhaps I expected something with more machismo, or perhaps I’ve just got Jonathan Franzen perpetually in the back of my head bashing women writers (god help me). Regardless, I’d be interested to see how author gender impacted the selection of books.

Given the CBW’s aims of “learning about our allies” and “learning about our enemies,” I would also be interested to track author nationality, or the book’s primary setting. Some of this can be collected as metadata—though I don’t want to put too much weight on authorship—but some of these questions can best be answered through analysis (NLP recognition for place names, for instance, to track primary settings). Through the process of building, I hope to develop some more hypotheses beyond my initial thoughts (to be shared later) that might help guide the analysis phase of the project.

I’ll clean the data and make the corpus (or, as much as possible) available via GitHub, as I would love for others to join me in this analysis. And I’ll certainly be blogging about the process the way.