A beginner’s guide to working with text as data

This book offers a practical introduction to digital history with a focus on working with text. It will benefit anyone who is considering carrying out research in history that has a digital or data element and will also be of interest to researchers in related fields within digital humanities, such as literary or classical studies. It offers advice on the scoping of a project, evaluation of existing digital history resources, a detailed introduction on how to work with large text resources, how to manage digital data and how to approach data visualisation. After placing digital history in its historiographical context and discussing the importance of understanding the history of the subject, this guide covers the life-cycle of a digital project from conception to digital outputs. It assumes no prior knowledge of digital techniques and shows you how much you can do without writing any code. It will give you the skills to use common formats such as plain text and XML with confidence. A key message of the book is that data preparation is a central part of most digital history projects, but that work becomes much easier and faster with a few essential tools.

Abstract only
Structured text
Jonathan Blaney, Sarah Milligan, Marty Steer, and Jane Winters

In the last chapter we had a close look at a transcription of Balls Pond Road in an unstructured format. We were able to differentiate some features of the text with regular expressions, but things like cross streets and map references added noise to the data that was cumbersome to deal with. In this chapter we will explore the same data structured in the format XML and see what advantages that brings, and what difficulties remain. As a historian, the format of structured data you are likeliest to come across is XML, because it is well suited to

in Doing digital history
Jonathan Blaney, Sarah Milligan, Marty Steer, and Jane Winters

-readable text in plain text format Machine-readable text in XML format If we have done our work well, we have already produced many digital assets that are of interest not only to other academics but also to a much broader community, from family historians to the person researching the history of their house, from the local borough council to the researcher in Australia trying to track down the movements of a particular nineteenth-century individual. Managing the digital material we have produced is an added challenge – another thing to consider and learn how to do. But

in Doing digital history
Jonathan Blaney, Sarah Milligan, Marty Steer, and Jane Winters

XML files, but a decision was taken to make those files openly available and downloadable. 8 We will cover working with XML in Chapter 5 . Historians and other humanities researchers were quick to take advantage of digital Hansard, and a number of new research projects were funded to exploit the potential of this rich dataset. The online interface developed by the parliamentary team was fairly rudimentary, but the openness of the data meant that researchers could download it, manipulate and enhance it, and then make the enhanced data available for reuse and

in Doing digital history
Jonathan Blaney, Sarah Milligan, Marty Steer, and Jane Winters

own data before opening it in Excel. This is because your data is unlikely to have tabs in it: the separator should ideally be something which is not used for anything else in the file. Commas in historical texts are, of course, highly likely to be used as ordinary commas. As you will have seen, the Post Office listings may contain multiple commas per line. If we want to do some calculations in Excel on the data we have been working on, how might you go about it? We can use grep to extract some text from a collection of XML files, add tabs using regex and paste

in Doing digital history
Abstract only
Jonathan Blaney, Sarah Milligan, Marty Steer, and Jane Winters

text in Chapter 5 . We will show that plain text is harder to deal with, although perhaps easier to get hold of, and that structured text is preferable when available, even if at first glance its appearance may be more forbidding. For structured text we concentrate on XML (Extensible Markup Language), but the approaches we take should transfer reasonably easily to other formats. Chapter 6 , ‘Caring for your digital history project’, covers the practicalities of managing your data and sharing it effectively. Our section on research data management spends a fair

in Doing digital history
Abstract only
Ben Cohen and Eve Garrard

.hrw.org/backgrounder/mena/iraq1217bg.htm#1 * www.telegraph.co.uknews//main.jhtml?xml=%2Fnews%2F2003%2F04%2F16%2Fwshort16.xml&secureRefresh=true&_requestid=35171 † www.guardian.co.uk/g2/story/0,3604,934300,00.html ‡ www.guardian.co.uk/comment/story/0,3604,980363,00.html * www.guardian.co.uk/comment/story/0,3604,980363,00.html

in The Norman Geras Reader
Author: Sara De Vido

The book explores the relationship between violence against women on one hand, and the rights to health and reproductive health on the other. It argues that violation of the right to health is a consequence of violence, and that (state) health policies might be a cause of – or create the conditions for – violence against women. It significantly contributes to feminist and international human rights legal scholarship by conceptualising a new ground-breaking idea, violence against women’s health (VAWH), using the Hippocratic paradigm as the backbone of the analysis. The two dimensions of violence at the core of the book – the horizontal, ‘interpersonal’ dimension and the vertical ‘state policies’ dimension – are investigated through around 70 decisions of domestic, regional and international judicial or quasi-judicial bodies (the anamnesis). The concept of VAWH, drawn from the anamnesis, enriches the traditional concept of violence against women with a human rights-based approach to autonomy and a reflection on the pervasiveness of patterns of discrimination (diagnosis). VAWH as theorised in the book allows the reconceptualisation of states’ obligations in an innovative way, by identifying for both dimensions obligations of result, due diligence obligations, and obligations to progressively take steps (treatment). The book eventually asks whether it is not international law itself that is the ultimate cause of VAWH (prognosis).

Abstract only
Katy Hayward

2008. 11 ‘The EU Treaty paves the way for a more effective EU which can serve the needs of Europe’, Brian Lenihan, Minister for Finance, Bray, 21 April 2008, www.fiannafail.ie/article.phpx?topic=123&id=8883&nav=Local%20News. 12 ‘EU summits: Supplementary questions’, Dáil Debates, 656(4), http:// debates.oireachtas.ie/DDebate.aspx?F=DAL20080617.xml&Node=H41&Page=3.

in Irish nationalism and European integration
Abstract only
Unstructured text
Jonathan Blaney, Sarah Milligan, Marty Steer, and Jane Winters

can be ready by any text editor , and text that can only be read by specific software, such as Microsoft Word or Pages on a Mac. Compared to these latter programs, text editors look a bit different, seeming comparatively unadorned because they omit typographic features such as font changes. Table 4.1 Common file formats File extension Format Plain text? .doc, docx word processing no .xsl, .xslx spreadsheets no .txt any text yes .xml XML yes .csv, .tsv comma-separated or tab-separated values

in Doing digital history