eng N-gram Language Models based on DeWaC German web corpus

eng The resource is a set of language models at syllable and phone level. The context length (n-gram length) of the models ranges from 1 to 4 for syllable level and 1 to 6 for phone level. For each n-gram length, two model versions are provided: A forward version which contains the probability of a unit to occur given the preceding context, and a backward version which contains the probability of a unit to occur in the follow ing context. Each forward and backward model has a version that includes syllable boundary in formation and a version without syllable boundaries. The models were trained on the DeWaC German web corpus (Baroni and Kilgarriff 2006) using the SRILM language modeling toolkit (Stolcke 2002). Syllabification was performed using HMM syllable tagger (Schmid, Möbius and Weidenkaff 2007).

Deutsch

15 Gigabyte

public

5640efea-43e8-47e5-8b95-98fef09d5215

c1c9b626-0a08-4962-9a02-04fd60f7cd5f

CLARIND-UdS: Repositorium für Sprachressourcen an der Universität des Saarlandes

structuredDataset

Sprachwissenschaften

sonstiges

2022

Keine Verknüpfungen gefunden