This book offers a practical introduction to digital history with a focus on working with text. It will benefit anyone who is considering carrying out research in history that has a digital or data element and will also be of interest to researchers in related fields within digital humanities, such as literary or classical studies. It offers advice on the scoping of a project, evaluation of existing digital history resources, a detailed introduction on how to work with large text resources, how to manage digital data and how to approach data visualisation. After placing digital history in its historiographical context and discussing the importance of understanding the history of the subject, this guide covers the life-cycle of a digital project from conception to digital outputs. It assumes no prior knowledge of digital techniques and shows you how much you can do without writing any code. It will give you the skills to use common formats such as plain text and XML with confidence. A key message of the book is that data preparation is a central part of most digital history projects, but that work becomes much easier and faster with a few essential tools.
As we noted in the Introduction, nothing dates as quickly as predictions about the future. Consequently, we will focus here on identifying general directions of travel rather than new tools, technologies and methods. In general, we see the future of digitalhistory as one of gradual evolution and embedding rather than of revolution and disruption. Digital methods will be more widely adopted as we gain greater access to more digital primary sources, and well-established digital tools are likely to become easier to use for a large number of researchers. This may
It is a difficult task, doomed in advance, to say in a few words what has really changed in our area of study, and especially how and why that change took place. 1
There are a number of strands we have to try to weave together in describing the context and development of digitalhistory. We will start by discussing the place of digitalhistory within the broader context of digital humanities, and then within the context of the development of technology in the post-war period. We will move on to discussing the effect of the digital on
possibilities if work continued. As the project becomes more complex and the outputs more diverse, so we need to become more careful about managing all of this material effectively. The illustrative work we have done already on the Post Office directory has generated about a thousand files, representing different stages of the work, but a genuine digitalhistory project could easily involve orders of magnitude more files and more complexity.
Our goal may well be to publish findings in some form, and to work rigorously through this process we will need to have everything
(women are given gendered titles, such as ‘Mrs’ in the directory, whereas men generally have no titles).
All this becomes feasible once we have the text in a machine-readable form, but there is still work to be done.
SCANNING THE DIRECTORY
The first stage in any digitalhistory project that focuses on textual material is to acquire or create a machine-readable version of the data. By machine-readable we mean that text-based software can understand it as text . For example, a photograph of a page of text might be a digital object readable by image software such
This book is concerned with digitalhistory, and that necessarily means working with primary sources that are available in digital form. ‘Digital’, however, encompasses a large and diverse range of materials. The most obvious distinction the historian faces is that between sources which have been digitised from a physical original, for example a thirteenth-century manuscript or a nineteenth-century newspaper, and sources which are described as ‘born digital’, such as emails, Word documents or web pages. Once your source is available in machine
great deal without programming, by leveraging the work of others. Learning to program is interesting and useful but it is not essential to doing digitalhistory and many digital historians choose not to. There are certainly things that can only be done with coding but we have purposely omitted all such things. We hope you will be surprised by the power and flexibility of the approaches we show you.
We have deliberately chosen to focus almost entirely on tools which have been used for decades and which we expect to continue to be used for many more years. They are
data from this book’s repository on GitHub:
The data you will find here is primarily the rekeyed pages of the ‘B’ streets from the Post Office directory, a process described in the previous chapter. Within the folder data you can find plain text transcriptions, which we will be using throughout this chapter. The folder XML is the same material but ‘marked up’, and these marked-up files will be the focus of the next chapter.
You will find a button on the main page for the repository called ‘Clone or
from data sources. The statistical analysis of that resulting data and what evidential weight it can bear is a more complex question. On this latter process we can say very little: statistical claims about historical data is not an area of digitalhistory per se but a specialist area with many subtleties. Trained statisticians themselves often, in good conscience, disagree with each other about the significance and meaning of results. 9 David Spiegelhalter, who specialises in communicating statistics to the public, warns that:
Far from freeing us from the
of changes in the language used by newspapers to describe ‘contentious
gatherings’, arguably prefiguring the vogue for text-mining in digitalhistory twenty years later. He categorised the ‘contentious repertoire
widely available to ordinary people’ in the eighteenth century as predominantly violent and riotous, featuring carnivalesque celebration and
other locally distinct forms of expression, and claim-making using intermediary authorities to intercede with parliament. He argued that by the
early nineteenth century, modes of protest had changed to