This chapter focuses on data management and data sharing. Strong emphasis is given to the tool Git as a highly recommended way to keep data under version control. As a way of sharing data, the near-ubiquity of Git means that a basic understanding of it as a tool is essential for historians wishing to use work shared by others on public Git repositories. The use of Git and Git repositories is covered in some depth.
This chapter outlines the history of digital history and digital humanities more broadly. This historical narrative is intertwined with coverage of the technological changes which have made certain types of digital history feasible or even popular, and noting the economic drivers to certain types of material being preferentially digitised. The effect of the digital on the way historians approach reading, writing, collaboration, discovery (search) and citation is also discussed.
This book offers a practical introduction to digital history with a focus on working with text. It will benefit anyone who is considering carrying out research in history that has a digital or data element and will also be of interest to researchers in related fields within digital humanities, such as literary or classical studies. It offers advice on the scoping of a project, evaluation of existing digital history resources, a detailed introduction on how to work with large text resources, how to manage digital data and how to approach data visualisation. After placing digital history in its historiographical context and discussing the importance of understanding the history of the subject, this guide covers the life-cycle of a digital project from conception to digital outputs. It assumes no prior knowledge of digital techniques and shows you how much you can do without writing any code. It will give you the skills to use common formats such as plain text and XML with confidence. A key message of the book is that data preparation is a central part of most digital history projects, but that work becomes much easier and faster with a few essential tools.
This chapter provides a survey of the landscape of contemporary digital history, with coverage of the way individual research projects have built upon each other. An understanding of what is available and how it can be used is vital to choosing a viable research project, and this chapter covers technologies such as optical character recognition (OCR), handwritten archives, crowdsourcing, big data and web archives. The chapter concludes with discussion of publication broadly conceived, so not simply of the final outputs of a project.
This chapter gives a description of the life-cycle of a digital history project, from digitisation of source material onwards, with advice on the practicalities and costs of different approaches to producing machine-readable text. There is introductory coverage of data cleaning and version control using Git, although these are covered more fully in later chapters.
The Introduction provides a summary of the aims and intended audience of the book, and a justification of the choice of tools to be used: the book recommends well-tested, free tools for working with large amounts of text. The Introduction also draws attention to the importance of data cleaning – the preparation of data for use in a project. A precis of the following chapters and appendices is given.
This chapter offers a guide to visualising historical data, with two case studies centred on the Post Office directory data used throughout the book. The first visualisation is two stacked bar charts showing the most common female professions against men in the same professions and breaking down professions by married and unmarried women. The second visualisation is a map of one London street in 1879, with discussion of the process and the thinking that led to the finished visualisation.
This chapter looks at likely trends for digital history over the next few years, with predictions about the impact of historical material increasingly being available solely or additionally in digital form. There is a discussion of the ethics of digital history projects in terms of their environmental impact and in the way they can uncover and make public information about individuals in unprecedented ways.
The first of two chapters on working with text, this chapter covers the difference between plain text formats and proprietary formats, the pattern-matching technique ‘regular expressions’, the command line as an interface for working with large amounts of text, particularly the grep command. All of the examples work on a specific historical text, a Post Office directory for late nineteenth-century London.
The second of two chapters on working with text, this chapter covers structured text and, in particular, the markup language XML, with a short passage on the Text Encoding Initiative (TEI) guidelines. As with the previous chapter, the Post Office directory is used throughout as an example historical text.