eng N-gram Language Models based on DeWaC German web corpus
eng The resource is a set of language models at syllable and phone level. The context length (n-gram length) of the models ranges from 1 to 4 for syllable level and 1 to 6 for phone level. For each n-gram length, two model versions are provided: A forward version which contains the probability of a unit to occur given the preceding context, and a backward version which contains the probability of a unit to occur in the follow ing context. Each forward and backward model has a version that includes syllable boundary in formation and a version without syllable boundaries. The models were trained on the DeWaC German web corpus (Baroni and Kilgarriff 2006) using the SRILM language modeling toolkit (Stolcke 2002). Syllabification was performed using HMM syllable tagger (Schmid, Möbius and Weidenkaff 2007).
Deutsch
15 Gigabyte
public
5640efea-43e8-47e5-8b95-98fef09d5215
c1c9b626-0a08-4962-9a02-04fd60f7cd5f
CLARIND-UdS: Repositorium für Sprachressourcen an der Universität des Saarlandes
structuredDataset
Sprachwissenschaften
sonstiges
2022