I first began blogging here back in 2006 to think about the bits of technology which got me excited, little did I think that nearly 20 years later, I would still enjoy it as much as I did that very first day.
For the last decade at the end of each year, I have done a round up of the most popular blogs, mostly by looking at the hits they received. However, for the longest time, I have felt that there’s much more to the story in the stats but I couldn’t quite decide how to mine the blog for that story.
I did think about applying digital humanities techniques to the blog after I had spent time reading up on how literary informatics librarians work when approaching the canon. In that previous link, I refer to Dr Heather Froehlich looking for the presence and absence of characters or gender-specific words in the plays of Shakespeare. No doubt because she had a theory. The problem was that I didn’t have anything I wanted to find. I didn’t have a theory. I just wanted to find the patterns.
At the end of 2023, I printed out the whole blog aka my complete works, which was no small feat, given that it came in at 343,000 words – nowadays it must be half a million – hoping that if I read it I would see the patterns. The book gave me no end of joy to carry around but it turned out to be too difficult to analyse.
Then, I thought that I would use some AI but as I have said in previous blogs about machine learning and natural language processing, these algorithms cannot find these patterns themselves without a human supervising what they do. This is done by a human or group of humans carefully selecting the data on which it will be trained and/or, marking up said data so that it can easily find the patterns. This was no help at all as I would still have to decide what data to feed in and how to mark it up for training. I still needed a pattern for which to look.
Recently, I have been reading up about the use of digital humanities techniques once more and this inspired me to take a different approach, a digital humanities approach to my complete works, which hopefully are not. Although, there is the very big worry apparently about the flattening of the humanities by making the digital humanities a big part of it in a neoliberal displacement of hermeneutics. This blog is born digital phewy and I am just using computing techniques borrowed by the humanities to digitally consider their texts in the first place. So, it is a very modern kind of archive and I find it all very exciting.
Helped by Woolf and Pepys
So! I started by looking at two very famous diaries, those of Virginia Woolf and Samuel Pepys which inspired me to treat my blog as less of a complete works but more like a diary. Generally speaking, when we look at at diaries we are looking for similar questions that I am might want to ask of my blog.
The one caveat is that Woolf and Pepys wrote their diaries in private. Pepys always had one eye on posterity. Woolf went back and forth on whether she should destroy them or not. Though she too perhaps had one eye on posterity as she appreciated that people would want to know her process.
In contrast, I have always written my blog publicly. It is highly curated and thus, a selective public performance or performative writing practice.
Alongside this for many years I kept a private diary on various installations of WordPress on local servers on my machine, never for public consumption. Indeed the two times when I did accidently publish a personal blog on my public WordPress I was mortified for days and days and really hoped that no one read it. Writing for a public versus a private space is very different indeed and I have blogged about this too in the blog on privacy as well as the dance between privacy, intimacy and the WWW.
Both Pepys and Woolf’s diaries have been studied at length with common themes and it was those themes I was interested in and how they could be applied to studying my blog even though they were quasi private. They include: emotional trends, personal philosophy, language and selfhood, key events, gender and technology, feminism, and the topics I write the most about. Off the top of my head, I might say human-computer interaction (HCI), but I wouldn’t really know until I saw the analysis results and I didn’t really write about HCI for the longest time. Once I got started thinking about my blog in this way, the list of potential ways of exploring it became very long.
Each ones of these topics would be mined from the data using digital humanities tools, aka me writing them in python. I love a bit of python, having taught my girls as well as performing sentiment analysis. This approach would hopefully uncover hidden themes and stylistic shifts and conceptual links beyond my chosen tags, which have always been a bit haphazard. I would measure if and how I have strategically shifted between personal/academic/ journalistic voices over the last two decades.
Overall, this project has the potential to finally prove my husband right when he says that I will one day disappear up inside my blog, but I would be very happy in there should that day come.
I guess that I will be writing more about this, once I begin coding up the various digital humanities techniques. For now I am including below a recording of The Deep Dive (alternatively, can watch it on YouTube).
The two hosts discuss how the blog is prime grey literature (information which is not formally or commercially published), what some techniques might be employed, how they would be used and, the ethics digital humanists must consider in and around web scraping.