eng The Corpus of Late Modern English Texts, version 3.1
eng CLMET3.1 is a principled collection of public domain texts drawn from various online archiving projects. In total, the corpus contains some 34 million words of running text. It incorporates CLMET, CLMETEV, and CLMET3.0, and has been compiled following roughly the same principles, that is: The corpus covers the period 1710-1920, divided into three 70-year sub-periods. The texts making up the corpus have all been written by British and Irish authors who are native speakers of English. The corpus never contains more than three texts by the same author. The texts within each sub-period have been written by authors born within a correspondingly restricted sub-period. However, compared to the earlier versions, it comes with a number of important improvements (in addition to being substantially bigger): CLMET3.1 comes with an explicit genre classification. It is approximately genre-balanced. It is part-of-speech tagged. The corpus files have standardized text headers containing descriptive meta-data. For each text, explicit information is provided on text provenance. The corpus architecture allows subsequent expansions. The corpus is CWB compatible.
Englisch
34 Millionen Wörter
public
d9b063c0-7dcd-49fe-8e6a-6f0a28279d1a
c1c9b626-0a08-4962-9a02-04fd60f7cd5f
vorhanden
CLARIND-UdS: Repositorium für Sprachressourcen an der Universität des Saarlandes
corpus
Sprachwissenschaften
geschrieben