The Corpus of Late Modern English Texts, version 3.1

Textual collection

The Corpus of Late Modern English Texts, version 3.1

Title

eng The Corpus of Late Modern English Texts, version 3.1

Description

eng CLMET3.1 is a principled collection of public domain texts drawn from various online archiving projects. In total, the corpus contains some 34 million words of running text. It incorporates CLMET, CLMETEV, and CLMET3.0, and has been compiled following roughly the same principles, that is: The corpus covers the period 1710-1920, divided into three 70-year sub-periods. The texts making up the corpus have all been written by British and Irish authors who are native speakers of English. The corpus never contains more than three texts by the same author. The texts within each sub-period have been written by authors born within a correspondingly restricted sub-period. However, compared to the earlier versions, it comes with a number of important improvements (in addition to being substantially bigger): CLMET3.1 comes with an explicit genre classification. It is approximately genre-balanced. It is part-of-speech tagged. The corpus files have standardized text headers containing descriptive meta-data. For each text, explicit information is provided on text provenance. The corpus architecture allows subsequent expansions. The corpus is CWB compatible.

Language

Englisch

Size

34 Millionen Wörter

License

public

entityId

d9b063c0-7dcd-49fe-8e6a-6f0a28279d1a

sourceId

c1c9b626-0a08-4962-9a02-04fd60f7cd5f

Fulltext available

vorhanden