20 of 1432 resources
Textual collection
The Collection of Eighteenth-Century French Novels 1751-1800 is a corpus of French prose built within the project ‘Mining and Modeling Text’ (2019-2023) at Trier Center for Digital Humanities.
Textual collection
A collection of American drama texts focusing on the structural markup.
Textual collection
The corpus was compiled as part of the project "The Beginnings of Modern Poetry," which uses digital methods to study German-language literature from about 1850 to 1920. It consists of texts in German-language poetry anthologies published in the second half of the 19th century and the early 20th century. The selected anthologies focus on poetry that was contemporary at the time, and, in the case of the anthologies published around 1900, on poems that the anthologists considered "modern". In total, the corpus consists of more than 20 anthologies containing more than 6000 poems.
Textual collection
The 42 texts from the Folger Digital Texts, accesible in the TextGrid Repository.
Textual collection
Distant Reading for European Literary History (COST ActionCA16204) is a project aiming to create a vibrant and diverse networkof researchers jointly developing the resources and methods necessaryto change the way European literary history is written. Grounded inthe Distant Reading paradigm (i.e. using computational methods ofanalysis for large collections of literary texts), the Action willcreate a shared theoretical and practical framework to enableinnovative, sophisticated, data-driven, computational methods ofliterary text analysis across at least 10 European languages.Fostering insight into cross-national, large-scale patterns andevolutions across European literary traditions, the Action willfacilitate the creation of a broader, more inclusive andbetter-grounded account of European literary history and culturalidentity.
Textual collection
The corpus contains novels written by Spanish authors published between 1880 and 1939. The original corpus contains in total 358 prose texts, however, due to copyright issues, 219 can be published currently. The corpus is designed considering the data of two authoritative Histories of Literature and each text is annotated with several types of metadata. Further details on the corpus can be found below.
Textual collection
In dieser gedruckten Blattsammlung wurden Personen aus Böhmen und Mähren verzeichnet, denen in den Jahren 1933 bis 1941 die deutsche Staatsangehörigkeit durch das Gesetz über den Widerruf von Einbürgerungen und die Aberkennung der deutschen Staatsangehörigkeit vom 14. Juli 1933 aberkannt worden war. Die Kartei umfasst die Lieferungen 1 vom 11.5.1938 bis Lieferung 212 vom 25.4.1944. Verzeichnis der Personen, denen die deutsche Staatsangehörigkeit aberkannt worden ist
Textual collection
Das Ziel des vorliegenden Projekts ist es, folkloristische Texte aus den mündlichen Repertoires der im Kaukasus zahlreich vertretenen Ethnien und Sprachen zu sammeln und sie gemäß den Anforderungen der FAIR-Datenprinzipien zugänglich zu machen.
Textual collection
A multilingual parallel corpus created from translations of the Bible.
Textual collection
The CLiGS textbox contains several corpora of literary texts in Romance languages. It was made made available by the CLiGS junior research group.
Textual collection
This collection consists of 26 mythological poems in Spanish dating from the 16th and 17th centuries, written by the most representative authors of the period (Lope de Vega, Luis de Góngora, Jáuregui, Villamediana, etc.).
Textual collection
Arabic news corpus (United Kingdom) based on material crawled in 2018 created in the project Deutscher Wortschatz or Leipzig Corpora Collection. The project regularly collects and processes available documents from the Internet (typically in an annual cycle) and other sources. The results are corpora and corpora-based dictionaries for more than 250 languages, which provide statistical information about almost each word, example sentences and links to related words. Because of the huge amount of used text material containing several million sentences, information about almost every word can be provided. The service ranks among the most comprehensive information systems about the German language and provides the largest freely available amounts of data for many other languages. For copyright reasons, the data are provided as derived text formats that do not allow reconstruction of the original document structures.
Textual collection
Braunschweiger Zeitung 2012 ist Teil des Deutschen Referenzkorpus DeReKo. Die Korpora geschriebener Gegenwartssprache des IDS bilden die weltweit größte linguistisch motivierte Sammlung elektronischer Korpora mit geschriebenen deutschsprachigen Texten aus der Gegenwart und der neueren Vergangenheit. Sie enthalten belletristische, wissenschaftliche und populärwissenschaftliche Texte, eine große Zahl von Zeitungstexten sowie eine breite Palette weiterer Textarten und werden kontinuierlich weiterentwickelt. Aktueller Stand: https://www.ids-mannheim.de/digspra/kl/projekte/korpora/archiv-1/ Abrufbar über KorAP: https://korap.ids-mannheim.de/ Abrufbar über Cosmas II: https://cosmas2.ids-mannheim.de/cosmas2-web/ Weitere Informationen: https://www.ids-mannheim.de/digspra/kl/projekte/korpora/
Textual collection
The MCScript corpus is a large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge. Our dataset complements similar datasets in that we focus on stories about everyday activities, such as going to the movies or working in the garden, and that the questions require commonsense knowledge, or more specifically, script knowledge, to be answered. We show that our mode of data collection via crowdsourcing results in a substantial amount of such inference questions. The dataset forms the basis of a shared task on commonsense and script knowledge organized at SemEval 2018 and provides challenging test cases for the broader natural language understanding community.
Textual collection
The ressource contains a semantic analysis of Spanish nonce-formations derived with potentially collective suffixes (based on a list of hapax legomena in esTenTen11). Metadata: List_hapaxlegomena_esTenTen11.xml Ressource: Liste_coll_HL_SP.xlsx
Textual collection
Das Dingler-Korpus enthält den kompletten Bestand des ursprünglich von J. G. Dingler herausgegebenen „Polytechnischen Journals“ (1820–1931). Die Ressource mit über 200.000 Seiten ist im Volltext erfasst, vollständig in TEI-P5 annotiert und als „DinglerOnline“ unter der Lizenz CC BY-SA 4.0 nachhaltig als Forschungsdaten für die Nachnutzung verfügbar.
Textual collection
This record comprizes the digitized manuscript collected by Angelina Ivanovna Kuzmina (1924–2002) between 1962 and 1977 plus additional structured information. The attached dataset contains metadata on individuals and locations, indexing and keywording with respect to contenttype ant grammatical information.
Textual collection
Das Korpus ePoetics umfasst eine Auswahl von Poetiken und Ästhetiken aus dem Zeitraum von 1770 bis 1960, die in dem vom BMBF geförderten Projekt „ePoetics – Korpuserschließung und Visualisierung deutschsprachiger Poetiken (1770–1960) für den ‚Algorithmic Criticism‘“ digitalisiert wurden. Alle Texte wurden in Abstimmung mit dem DTA gemäß dem DTA-Basisformat annotiert.