~ Registry Navigation
Search
Resources and entities
Textual collections
Services
Editions
Lexical Resources
Repositories
Further Entities
Import sources
Institutions
Persons
Works
EN
German
English
Login
Problems, troubleshooting, features
Textual collection
Published
Contentual information
Title
The element is a mandatory field
Southern Sotho Web subcorpus (South Africa) from 2018 (sot-za_web_2018_10K)
eng
Description
The element is a mandatory field
Southern Sotho Web subcorpus (South Africa) based on material from 2018 (10,000 sentences) created in the project Deutscher Wortschatz or Leipzig Corpora Collection. The project regularly collects ...
Southern Sotho Web subcorpus (South Africa) based on material from 2018 (10,000 sentences) created in the project Deutscher Wortschatz or Leipzig Corpora Collection. The project regularly collects and processes available documents from the Internet (typically in an annual cycle) and other sources. The results are corpora and corpora-based dictionaries for more than 250 languages, which provide statistical information about almost each word, example sentences and links to related words. Because of the huge amount of used text material containing several million sentences, information about almost every word can be provided. The service ranks among the most comprehensive information systems about the German language and provides the largest freely available amounts of data for many other languages. For copyright reasons, the data are provided as derived text formats that do not allow reconstruction of the original document structures.
eng
Süd-Sotho Web-Teilkorpus (Südafrika) basierend auf Texten von 2018 (10.000 Sätze) erstellt im Rahmen des Projektes Deutscher Wortschatz bzw. Wortschatz Leipzig. Das Projekt sammelt regelmäßig (meis...
Süd-Sotho Web-Teilkorpus (Südafrika) basierend auf Texten von 2018 (10.000 Sätze) erstellt im Rahmen des Projektes Deutscher Wortschatz bzw. Wortschatz Leipzig. Das Projekt sammelt regelmäßig (meist jährlich) frei verfügbare Dokumente im Internet und aus anderen Quellen und bereitet diese auf. Das Ergebnis sind Korpora und korpusbasierte Wörterbücher für über 250 Sprachen, in denen zu jedem Wort statistische Angaben, Beispielsätze und Links zu verwandten Wörtern verfügbar sind. Das Angebot zählt zu den umfangreichsten Informationssystemen zur deutschen Sprache und stellt für viele weitere Sprachen die jeweils größten frei verfügbaren Datenmengen bereit. Zum Schutz des Urheberrechtes werden die Daten als abgeleitete Textformate bereit gestellt, die keine Rekonstruktion der ursprünglichen Dokumentenstrukturen erlauben.
deu
Size
The element is a mandatory field
10000 sentences, 213418 tokens
License
The element is a mandatory field
Multiple entries are permitted
public
License-URL
The element is a mandatory field
Multiple entries are permitted
Content is validated according to the data model
https://creativecommons.org/licenses/by-nc/4.0/
Modality
The element is a mandatory field
Multiple entries are permitted
geschrieben
Language
Optional field, specification not mandatory
Multiple entries are permitted
Süd-Sotho-Sprache (sot)
Datatype
The element is a mandatory field
Multiple entries are permitted
corpus
text
Creation date
Optional field, specification not mandatory
Publication date
Optional field, specification not mandatory
Temporal coverage
Optional field, specification not mandatory
Fulltext available
The element is a mandatory field
nicht vorhanden
Annotation layer
Optional field, specification not mandatory
Multiple entries are permitted
Collection type
Optional field, specification not mandatory
Multiple entries are permitted
Genre
Optional field, specification not mandatory
Multiple entries are permitted
Discipline
Optional field, specification not mandatory
Multiple entries are permitted
Keywords
Optional field, specification not mandatory
Multiple entries are permitted
Technical information
PID
The element is a mandatory field
Content is validated according to the data model
https://hdl.handle.net/hdl:11022/0000-0007-CA6B-E
Access
The element is a mandatory field
Multiple entries are permitted
https://repo.data.saw-leipzig.de/resources?identifier=lcc/corpora/1102200000007CA6BE
https://fcs.data.saw-leipzig.de/lcc
Relations to other textual collections
Optional field, specification not mandatory
Multiple entries are permitted
Files and data streams
Optional field, specification not mandatory
Multiple entries are permitted
Technical documentation
Optional field, specification not mandatory
Multiple entries are permitted
Organizational information
Persons
Optional field, specification not mandatory
Multiple entries are permitted
Institutions
Optional field, specification not mandatory
Multiple entries are permitted
Sächsische Akademie der Wissenschaften zu Leipzig; https://d-nb.info/gnd/37863-X; wissenschaftliche Akademie in Leipzig
Verantwortliche Institution
Institution
The element is a mandatory field
Sächsische Akademie der Wissenschaften zu Leipzig; https://d-nb.info/gnd/37863-X; wissenschaftliche Akademie in Leipzig
Relation
The element is a mandatory field
Multiple entries are permitted
Verantwortliche Institution
Comment
Optional field, specification not mandatory
Contact
Optional field, specification not mandatory
Multiple entries are permitted
Funding body ID
Optional field, specification not mandatory
Multiple entries are permitted
Project title
Optional field, specification not mandatory
Multiple entries are permitted
Registry Metadata
Resource (latest version)
The element is a mandatory field
dae3fad8-037b-4821-8fb5-a9ca035a35c7
Displayed version
The element is a mandatory field
6814a9b1db0dfd74bb48593d
Version timestamp
The element is a mandatory field
May 2, 2025, 1:17:05 PM
Creator of the version
The element is a mandatory field
0704566
Versions
The element is a mandatory field
1
Resource created
The element is a mandatory field
May 2, 2025, 1:17:05 PM
Creator of the resource
The element is a mandatory field
0704566