Text Mining in the Old Bailey

Exploring the proceedings of the Old Bailey

The Old Bailey Online provides the full proceedings from the Old Bailey between 1674 and 1913, providing access on information from nearly 200,000 trials for free use for non-commercial purposes. This provides a wealth of text that can be mined and used for the purposes of text analysis and will allow us to discover more about the history of crime and justice in London.

As part of the module this week, we were asked to explore the proceedings of the Old Bailey using the Old Bailey Online search and API, and analyse and visualise text extracted from the Old Bailey Online using Voyant Tools. At first I wasn’t sure what I should search for, but then I recalled a recent podcast that I had listened to about the Lady Juliana – a ship that transported female convicts from Britain to Australia in the late 1700s – and thought it would be interesting to find out more about the these female convicts whose sentence included transportation to Australia. Recalling the stories of convicts being sentenced for relatively minor crimes of stealing handkerchiefs and loaves of bread, I wanted to find out more about the women and girls who were sentenced to transportation after being found guilty of shoplifting.

Continue reading

Changes to blog design

As I mentioned in one of my first posts, the initial design for this blog excluded some functions that I wanted to include later. Namely, I thought that it would not be all that helpful to include a lists of categories and tags to explore the posts when there were very few posts published, nor would it have been helpful to have a monthly archive when I had only posted over a short period of time.

Now that I’ve written a small handful of articles, I’ve added in widgets allowing readers to explore the blog posts by category and tags. I’ve also taken the time to clean up the categories and tags assigned to each article. Initially it wasn’t easy to know what categories and tags would be most useful, but as the blog has started to take shape, it’s become easier to categorise articles appropriately.

I still have not included a monthly Archive, but if I continue to write more articles, I may also include this to allow readers to view articles from a particular time.

Making sense of all those words!

Text analysis

This week in our DITA module, we were introduced to the topic of text analysis, the quantitative analysis of a text or group of aggregated texts. Text analysis can be considered a form of distant reading. Instead of a close reading of a passage or text to find meaning, distant reading looks for patterns across multiple texts to understand their meaning in context.

Text analysis can include searching for keywords or passages, identifying the number of times a word appears in the text, uncovering the context in which words appear and the concordance between different words.

Although it can be conducted manually, computer-assisted text analysis makes it possible to analyse large collections of text very quickly, undertake much more complex searches and also easily visualise the results to aid understanding.

Combining text analysis with the analysis of metadata can reveal patterns across time or geographical locations (some great examples are shown in this paper).

Exploring text analysis using Wordle

I created a data set to use to explore text analysis using a couple of different online text analysis tools. I used Altmetric to construct a list of 1,094 articles taken from the top 20 ranking journals in Library and Information Science that had been mentioned online in news, blogs or social media in the past year.

One of the most straightforward forms of text analysis and visualisation are word clouds. A word cloud visualises the frequency with which words are mentioned in a string of text. Words mentioned more frequently appear larger, while words mentioned less frequently appear smaller.

Wordle is a simple online tool for creating word clouds. You simply paste the text you wish to analyse, press a button and you can create a word cloud that you can edit and reformat.

Altmetric word cloud

Continue reading

Academia and altmetrics

Research impact

Understanding the impact research has is important for so many reasons. There is not much point in producing research that does not further our understanding or that makes no contribution to society or development. For these reasons – and many more – the impact research has is of great interest to the academy and beyond.

Measures of research impact are used to seek further funding and grants for research, by individual researchers to gain promotions, and can improve the standing of individual researchers, departments, universities and other research organisations. Measures of impact are sometimes also used as a proxy for measuring the quality of a research output.

Traditionally, the impact of published research – typically research findings published in a peer-reviewed journal –  has been measured in a number of ways. These include measures of journal reputation or impact (see for example Thomas Reuters Journal Citation Reports), the numbers of citations an article receives (see for example Source Normalised Impact per Paper) or an author’s H-index –  which is a measure of an author’s number of publications and number of citations.

Recently, these traditional measures of research impact have been the focus of criticism for several reasons. Journal-level metrics, including journal impact, have been criticised as lacking in transparency, as it is unclear how journal rankings and other measures of journal quality are calculated. Traditional measures of research impact also tend to focus on a specific journal or researcher rather than a specific piece of research, which makes it difficult to ascertain the impact of an individual research paper. Also, the number of citations an article receives, while a very useful way to measure potential impact, is a very slow way to measure research impact.

Alternative ways to measure impact

As more-and-more research is being published online, some of these more traditional measures of impact can now be complimented by alternative ways of measuring research impact. Altmetrics – alternative metrics – measure more than citations and journal quality. Altmetrics measure things such as download counts, page views, mentions in news articles and online as well as the number of shares of an article on social media.

Like traditional measures of journal impact, altmetrics should not be considered a proxy measure of quality, but rather of attention. Unlike traditional impact factors, altmetrics can measure the impact of individual research outputs, rather than measuring the impact of a particular journal or researcher. Altmetrics could also be considered a broader measure of impact than traditional measures. They provide indications on the number of times a paper has been viewed or downloaded, rather than just information about the number of times a paper is cited in other research. Through these types of metrics, altmetrics also provide information on the amount of attention a paper receives beyond the research community.

In recent years, a number of providers have emerged who have used altmetrics to create reports about the attention and impact of research. These include ImpactStory, Plum Analytics, and Altmetric. During class this past week, we had the opportunity – thanks to the kind folk at Altmetric – to start exploring article-level metrics by using their web app, the Altmetric Explorer. Altmetric also offers a number of other products including the Altmetric Bookmarklet, Altmetric API and Altmetric Badges.

Continue reading

Tweeting about the weather

Using TAGS

During our last Digital Information Technologies and Architectures (DITA) class, we were introduced to searching and archiving tweets using the TAGS (Twitter Archiving Google Sheet) application developed by Martin Hawksey.

This app is a mashup that uses the Twitter Search API and Google Sheets API to enable users to easily search, collect and archive tweets containing a specific term or hashtag from the past seven days. Along with the tweets themselves, TAGS also collects and archives the metadata related to each tweet, including the time and date when a tweet was sent, the username of the person sending the tweet, the number following and number of followers, and whether a tweet was sent in response to another tweet.

In addition to providing a spreadsheet listing this information, TAGS also provides a summary of the tweets and some visualisations of the tweets collected.


Following on from the Melbourne weather theme of my last post, this week I used TAGS to search, archive and analyse tweets containing #melbourneweather for the fortnight from Monday 20 October until Sunday 2 November. An initial glance glance at the output from TAGS showed a huge spike in the number of tweets with this hashtag on Monday 27 October, with 622 tweets sent that day, compared with an average of just 18 tweets per day the week before.

So why were there so many tweets about the weather on that day? A quick search uncovered that there had been a huge storm early on Monday morning in Melbourne which caused flash-flooding and the most rain for a year. The storm caused major disruptions to public transport caused several flights to be delayed and left many without electricity.

The following graph shows the daily amount of rainfall along with the number of tweets tagged with #melbourneweather. This shows – as is probably bleedingly obvious – that there is a link between the weather events, and the use of this hashtag.

#melbourneweather & rainfall

Continue reading

Four Seasons in One Day

Before moving to London a few months ago, I lived in Melbourne, a city obsessed with seeking out the best coffee, the AFL and the weather. As one of the few people in Melbourne who doesn’t drink coffee or follow AFL, the only obsession I share with my fellow Melburnians is the weather.

Melbourne, like London, is known for its changable weather, for having four seasons in one day. When I first moved to Melbourne – from a city with much more predictable weather – I quickly learned to always carry an umbrella, sunglasses and a cardigan, in case the weather turned. I also made a habit of checking the weather before heading out the door and obsessively refreshing the rain radar on the Bureau of Meteorology website whenever the clouds looked a little menacing.

But even then, I still sometimes got caught out by the changeable weather just like this Melburnian:

Continue reading