Blog Archives

The Proceedings of the Old Bailey as a linguistic corpus

One of the suggestions given after I delivered my paper on the history of multiple negation at the Perspectives on Prescriptivism syposium in Ragusa earlier this year was that I might find some useful spoken language data in this database. So I decided to spent some time searching the Old Bailey records. At first sight it looks like a wonderful resource, but I soon discovered a number of problems. To begin with, the database is not searchable for high-frequency words such as no or not. Neither proved less frequent, which allowed me to search for the kind of construction that I know is still regularly used in the eighteenth century. I thus found instances like: "but not so drunk neither" (1727) and "but the Money was not ready then neither" (1733). One instance was particularly useful, as the speaker identified himself as a servant (this was the kind of information I was actually looking for), and it is moreover also possible to identify the sex of the speaker. But other than that I found there was little I could do with the information found. There do seem to be increasing numbers of instances as the century progresses, but there is no way I can relate absolute numbers to amount of text. I would say therefore that the database is of limited use, other than to discover that a particular form or construction is indeed used in reported speech, by men as well as women and by people accused of having committed a crime (which does not of course assign them to any particular social class).

But I also found that parts of the text were scanned but not subsequently corrected, so that long <s> at times occurs as <f>, and nor as not. I have the impression that things got better the further I progressed into the eighteenth century, but it is worrying all the same.

I’d be interested in learning about other people’s experiences with this databse.

Eighteenth-century Corpora

I hope somebody can answer the following question:

How many English (historical) corpora are there at the moment?