“Beyond Two Cultures: Computational Literary Studies As Digital Scholarly Editing” in “Beyond Two Cultures”
Beyond Two Cultures
Computational Literary Studies As Digital Scholarly Editing
Katherine Bode
The field in which I work goes by several names, including in this collection, where it is called cultural analytics, computational studies, distant reading, quantitative analysis, and computational literary studies. Whatever it is called—and for ease of reference I will use the last of these terms, or its acronym, CLS—the references are welcome because this field is rarely brought into conversation with digital scholarly editing, despite their common involvement with computational literary texts, cultures, and infrastructures. This separation is one of a number that this collection bridges as it broadens the scope and deepens the connections of digital scholarly editing with old friends (including bibliography, book history, and literary studies) and new ones (including studies of archives, media, and infrastructures).
In the spirit of furthering this valuable building work, I want to suggest an alternative model for the relationship between digital scholarly editing and CLS to the one offered by Fotis Jannidis in this collection. His account of contrasting error cultures requires that we understand both fields as referring to pre-existing objects of analysis, whereas many of the contributions to this collection foreground the ontological and ethico-political implications of knowing in digital scholarly editing. Reflecting briefly on the way my own work combining CLS with digital scholarly editing over the past decade and a half has shifted to embrace this performative logic, I suggest how Jannidis is doing the same thing, notwithstanding his foregrounding of error. Recognising this commonality offers an alternative basis for collaboration, more suited to engaging with emerging textual assemblages and the challenges these present to established textual practices and politics.
To describe the distinct error cultures of scholarly editing and CLS Jannidis invokes C. P. Snow’s “two cultures.”1 But instead of skills and forms of knowledge being split into literary (humanities) and scientific realms, Jannidis contrasts the “literary or historical” approach in scholarly editing with the “empirical work” of CLS, which he aligns with the social sciences. Because scholarly editors pursue “the ideal of the perfect text . . . errors are a stigma” even as “everyone knows there are always mistakes,” whereas in CLS, “it is assumed that all methods are flawed,” so the best thing to do is “make empirically based statements about . . . error rates in order to judge the usefulness of tools, approaches, and algorithms” (87). Jannidis’s description of CLS is a welcome departure from the still-too-common-portrayal of that field as objectively discovering (or discovering objective) literary patterns and trends. It is also offered in the spirit of encouraging collaboration, on the sound basis that researchers work together better when they understand the “customs and practices . . . the rules” of each other’s fields (87).
Still, Jannidis’s model assumes that both fields engage with objects that precede their practices. Certainly, in making this claim about CLS, Jannidis is saying nothing more—or less—than most CLS scholars. As I have recently argued,2 CLS is strongly invested in “scalability”: the belief, as Anna Tsing defines it, that its objects exist “by nature in precision-nested scales,” with expansion equating to progress and occurring without changing the fundamental units involved or their relations.3 The fundamental objects in CLS’s scalable conception of literary phenomena are words, the texts that supposedly contain them, and the systems that supposedly contain those texts, in a framework little altered since Franco Moretti first proposed “distant reading” as a “focus on units that are much smaller or much larger than the text: devices, themes, tropes—or genres and systems.”4 But scholarly editing approaches textuality differently, and though this is true of the post-war field generally, it has been emphasized by textual scholars who work with digital technologies, especially social texts theorists.
The Platonic (“ideal . . . perfect text”) paradigm with which Jannidis characterizes scholarly editing is refuted by theorists who argue that editing is always with/in—simultaneously with and inside—textual phenomena. With this core premise, social texts theorists argue that the possibilities of their practices are constituted by the objects they are in the process of re/forming. This understanding is expressed, for instance, by Jerome McGann’s description of editing as done from an “inner standing point,”5 Johanna Drucker’s account of textual phenomena as “events not entities,”6 and Paul Eggert’s notion of editing as always “in the work—that is, within the ongoing life of the work—not on it.”7 Developing an argument that D. F. McKenzie made in his lectures on the sociology of texts in the mid-1980s,8 digital scholarly editors explore this indivisibility of textual practices and objects by theorising the entanglements of production-reception, materiality-sociality, past-present, and knowing-not-knowing in all textual encounters, including (and perhaps most especially, because explicitly and critically) in scholarly editing.
The essays in this collection embody this understanding in various ways. Some employ it to revise well-established textual phenomena, as in Dirk Van Hulle’s reforming of the Complete Works Edition paradigm, in terms of how affordances of digital scholarly editing reconstitute the boundaries of work and authorship through “a continuous dialectic between completion and incompletion” (34). It is also central to Julia Flanders’s call for a “social edition” that works by building relations of care and trust and emphasising mutual responsibility (as opposed to libertarian dreams of individual freedom and verification, via neglect of social differences). K. J. Rawson, Sarah Patterson, and Robert Warrior demonstrate the value of approaches to digital preservation that foreground the ethical and ontological implications of knowledge practices, as they explore how established metadata protocols and annotation procedures embody historical forms of discrimination that need to be interrogated to resist this legacy and care for marginalised people and communities. Marta L. Werner’s description of an ephemeral edition that disappears with global warming and species loss explores the ethical relationality and sociality of textual studies in planetary terms. The contributions to this collection, in other words, suggest that digital scholarly editing is very far from presuming, let alone seeking to resurrect, a perfect text.
Offering their own synopsis of post-war shifts, Cassidy Holahan and colleagues both explain why Jannidis understands the field in this way and emphasize the extent to which textual theory has departed from this assumption, noting: “it still seems impossible to imagine textual studies without a text at the center of the field’s practice. Yet . . . this is exactly what some textual critics . . . have been imagining for over twenty years now” (149). One way to characterise my research over the past decade and a half as I have worked to integrate CLS with digital scholarly editing is a grappling with this im/possibility. Reading by Numbers was based on something like the belief (in the possibility of perfect representation) that Jannidis ascribes to contemporary textual studies. It offered a new history of the Australian novel, challenging existing accounts of that form by analysing the Austlit database, treating it as an effectively objective record of this pre-existing thing.9 The serialised fiction in AustLit made me curious about the international fiction in nineteenth-century Australian newspapers: international because—taking AustLit as an objective record—I assumed that most if not all the Australian novels were already indexed. In creating To Be Continued (TBC): The Australian Newspaper Fiction Database, Carol Hetherington and I did identify a lot of international fiction in digitised historical newspapers.10 But there was also a great deal of fiction by colonial Australian authors not in AustLit, as A World of Fiction explores.11 In fact, TBC almost doubles the number of known, nineteenth-century Australian novels.12 While this difference exposes the fragility of my arguments in Reading by Numbers, when writing A World of Fiction I worked in something like the way Jannidis recommends for CLS, offering empirically based statements about likely errors in my approach, for instance, by comparing the newspapers digitised to the ones listed in indexes for advertisers of the period.
Such assessments assume a separation between my methods and objects of inquiry. However, especially as other research projects have built upon TBC, I have come to appreciate the extent to which it reforms what it had claimed to represent. Accordingly, my current book, provisionally entitled Computing Reading Writing, shifts to something like the approach I have described as characterising many contributions to this collection, pursuing what Karen Barad calls “ethico-onto-epistemo-logy—an appreciation of the intertwining of ethics, knowing and being—since every intra-action matters.”13 Its curated dataset, A Writing-Reading Interface (AWRI), explores an extensive array of reviews of Australian literature, in social media platforms, newspapers, and academic journals, and in building it—with collaborators Galen Cuthbertson and Geoff Hinchcliffe—we stress how our practices and those of the diverse readers whose writing we are exploring participate in constituting this object—(reviews of) Australian literature—we are presenting. Not measuring error does not mean eschewing responsibility for the reasonableness and reliability of our inquiries. Rather, the book conducts empirical inquiries in the same way that social texts theory advocates (and scientists do): accepting that the physical arrangements for investigating phenomena—whether we call them textual apparatuses or experimental conditions—are part of those phenomena and any findings made with them.
Although I know Jannidis disagrees (because I have enjoyed discussing our dis/agreements on many occasions) I think he is doing the same thing with his experiments in this collection. His measurements of literary phenomena do what Barad says apparatuses in quantum physics do: create concepts that are “specific physical arrangements.”14 The scholarly edition he investigates does likewise, embodying the German editorial constitution of the work as incorporating all its variants in a genetic text. Jannidis diffracts this text with different CLS methods, including using Levenshtein distance and Sentence-BERT to create non-semantic and semantic measures of variants for all verses. Jannidis does not specify errors for these measures. Where he does—for sentiment analysis, by indicating the categories in which the model’s attribution of emotion are more or less similar to human annotations—the example clarifies what Jannidis means by error: the difference between the model’s and human reactions. As human readers agree with each other only slightly more than they agree with the model, the difference might better be understood as calibration (adjusting a measuring instrument by comparing its results to a known standard) than measuring error (establishing the difference between a measured and a true value). While Jannidis frames his examples as bringing CLS methods to bear on the same thing that scholarly editors study (Goethe’s Faust) I would say he entangles different physical-conceptual arrangements to produce new textual formations. If his “results are less interesting” than the possibilities for collaboration they suggest (74), that is because the textual phenomena that CLS methods participate in producing are so different from the ones we are accustomed to engaging with (including established forms of scholarly editions) that we are still working out what is interesting—what matters—and what does not.
Part of achieving interesting—meaningful—results is embracing the im/possibility of a stable (pre-existing, continuing) text as the object for the textual practices that CLS calls methods. I say im/possibility because, as textual formations are changing, the challenge is recognising the familiar formations and functions of our textual practice (for example, authors and words) as conditions rather than certainties. We can see this challenge with the emergence of digital libraries. Even though libraries have long been recognized as embodying and negotiating relations of power, print-based modes of communication accustomed us to understanding them as places where books (or other textual objects) were held. We got away with this convenient shorthand, and the separation it allowed us to imagine, between libraries, texts, and our interactions with both, because the standards and systems for organising those relations were well-established, well-known, and reliable. Digital libraries involve many of those same relations, but they also create new ones, with the forms of textual interaction and navigation they enable arising from novel arrangements of bibliographical, lexical, and architectural forms. To what extent do established notions—such as that texts are products of individual authors—still enable inquiry in such situations, and what do these understandings preclude?
Ways of discussing CLS methods—including text mining and bags of words—reassure us that, while the structures might change, text is made of the same ontological simple: words. But even that certainty is placed under pressure, as Large Language Models (LLMs) relate to tokens, or sequences of characters, rather than words. OpenAI’s tokenizer tool offers a way of exploring this difference.15 Sometimes tokens and words align. For instance, “i went to the shop” is five words and five tokens. But often they do not. Type the shortest verse in the Bible (in its common English translation) into this tokenizer (“jesus wept”) and these two words translate as five tokens (j/es/us/ we/pt). Where historically, in scriptura continua, reading was a specialised practice of converting continuous text into meaning, LLM tokenizers convert the way these systems enact text into the words that we register as meaningful. What are these new agents and how should we understand their participation in textual productions? Tokenization also raises a pointed question for the way in which scholarly editing is commonly done. Its focus on minute variations in word choice and punctuation arguably validates Jannidis’s account of the field—as seeking a perfect text—in practice, whatever theory might say, and notwithstanding the counter-examples offered in this collection. To what extent can this preoccupation (and the tension it embodies between textual theory and practice) be sustained as emerging technologies reformulate text, and what new ways of editing might these transformations motivate?
Where Jannidis imagines how two fields might collaborate while following their own rules, and Elena Pierazzo worries that funding might shift from digital scholarly editing to projects that work with large datasets, I would rather see CLS being recognized as textual studies, and equally, see textual scholars doing CLS when editing with/in emerging textual formations. The last few decades of editorial theory are an excellent guide for engaging with emerging textual formations, even if many textual scholars only apply those lessons in print-based textual performances, and even though many CLS scholars have yet to recognize the relevance of these lessons to their work. What these guides show are ways of enacting textual phenomena that are responsible not because they seek perfection or admit error, but because they pay attention to and care about what is being done and might be done differently.
Katherine Bode is Professor of Literary and Textual Studies at the Australian National University. Her research employs computational methods to engage with literary phenomena and literary methods; her books include A World of Fiction (2017) and Reading by Numbers (2012). Work from her current book project, provisionally entitled Computing Writing: Why Literary Studies Matters, has been published in New Literary History (2022) and Critical Inquiry (2023).
Notes
1. C. P. Snow, The Two Cultures (Cambridge: Cambridge University Press, 2012 [1959]).
2. Katherine Bode, “Doing (Computational) Literary Studies,” New Literary History 54, no. 1 (2022): 531–58.
3. Anna Tsing, “On Nonscalability: The Living World is Not Amenable to Precision-Nested Scales,” Common Knowledge 18, no. 3 (2012): 523.
4. Franco Moretti, “Conjectures on World Literature,” New Left Review 2, no. 1 (2000): 57.
5. Jerome McGann, A New Republic of Letters: Memory and Scholarship in the Age of Digital Reproduction (Cambridge, Mass.: Harvard University Press, 2014), 24.
6. Johanna Drucker, “Entity to Event: From Literal, Mechanistic Materiality to Probabilistic Materiality,” Parallas 15, no. 4 (2009), 7–17.
7. Paul Eggert, The Work and the Reader in Literary Studies (Cambridge: Cambridge University Press, 2019), 76.
8. D. F. McKenzie, Bibliography and the Sociology of Texts (Cambridge: Cambridge University Press, 1999 [1986]).
9. Austlit, 2001–, https://www.austlit.edu.au/; Katherine Bode, Reading By Numbers: Recalibrating the Literary Field (London: Anthem Press, 2012).
10. Katherine Bode and Carol Hetherington, To Be Continued: The Australian Newspaper Fiction Database, 2018–, https://readallaboutit.com.au/.
11. Katherine Bode, A World of Fiction: Digital Collections and the Future of Literary History (Ann Arbor: University of Michigan Press, 2018).
12. Katherine Bode, Sarah Galletly, and Carol Hetherington, “Beyond Britain and the Book: The Nineteenth-Century Australian Novel Unbound/ed,” in The Cambridge History of the Australian Novel, ed. David Carter (Cambridge: Cambridge University Press, 2023), 44–62.
13. Karen Barad, Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning (Durham, North Carolina: Duke University Press, 2007), 185.
14. Karen Barad, Meeting the Universe Halfway, 109.
15. OpenAI, https://platform.openai.com/tokenizer.
We use cookies to analyze our traffic. Please decide if you are willing to accept cookies from our website. You can change this setting anytime in Privacy Settings.