SaCoCo Saarbrücken Cookbook Corpora eng
The Saarbrücken Cookbook Corpus is a diachronic corpus made up of cooking recipes organized into two different collections: historical and contemporary. The historical component contains a selection of recipes from different works. The full nomina of sources can be found listed as sources in the metadata. These recipes were collected and transcribed by Andrea Wurm as part of her PhD. For more information see Wurm 2007. The contemporary component contains cooking recipes from rezeptewiki.org. The selection criteria were temporal (only the last version of the recipe) and geographical (only recipes belonging to German speaking regions). The address of the wiki dump is provided in sources. ANNOTATION: The corpus contains two types of annotation: structural and positional. Structural annotation is written in XML and provides a description of the textual structure, on the one hand, and metatextual information and shallow semantics, on the other hand. STRUCTURAL ATTRIBUTES: metadata: id, collection, source, url, year, decade, period, language, ref; shallow semantics: type, course, cuisine, ingredient, method; structure: title, body, segment, paragraph, sentence. Positional annotation is provided at token level containing linguistic information. POSITIONAL ATTRIBUTES: word form; POS (TreeTagger, STTS tagset); lemma (TreeTagger); normalized form (automatic normalization using the algorithm described in Amoia and Martínez Martínez 2013). eng
Deutsch
1,7 Megabyte
public
931a66e2-532e-4d83-8d68-14d08e96d0ea
c1c9b626-0a08-4962-9a02-04fd60f7cd5f
vorhanden
CLARIND-UdS: Repositorium für Sprachressourcen an der Universität des Saarlandes
corpus
Sprachwissenschaften
geschrieben