Korpus / Textsammlung

SaCoCo Saarbrücken Cookbook Corpora

SaCoCo Saarbrücken Cookbook Corpora eng

The Saarbrücken Cookbook Corpus is a diachronic corpus made up of cooking recipes organized into two different collections: historical and contemporary. The historical component contains a selection of recipes from different works. The full nomina of sources can be found listed as sources in the metadata. These recipes were collected and transcribed by Andrea Wurm as part of her PhD. For more information see Wurm 2007. The contemporary component contains cooking recipes from rezeptewiki.org. The selection criteria were temporal (only the last version of the recipe) and geographical (only recipes belonging to German speaking regions). The address of the wiki dump is provided in sources. ANNOTATION: The corpus contains two types of annotation: structural and positional. Structural annotation is written in XML and provides a description of the textual structure, on the one hand, and metatextual information and shallow semantics, on the other hand. STRUCTURAL ATTRIBUTES: metadata: id, collection, source, url, year, decade, period, language, ref; shallow semantics: type, course, cuisine, ingredient, method; structure: title, body, segment, paragraph, sentence. Positional annotation is provided at token level containing linguistic information. POSITIONAL ATTRIBUTES: word form; POS (TreeTagger, STTS tagset); lemma (TreeTagger); normalized form (automatic normalization using the algorithm described in Amoia and Martínez Martínez 2013). eng

Deutsch

1,7 Megabyte

public

931a66e2-532e-4d83-8d68-14d08e96d0ea

c1c9b626-0a08-4962-9a02-04fd60f7cd5f

vorhanden

CLARIND-UdS: Repositorium für Sprachressourcen an der Universität des Saarlandes

corpus

Sprachwissenschaften

geschrieben

Keine verknüpften Ressourcen sind verfügbar!
Keine verknüpften Ressourcen sind verfügbar!