Lexikalische Ressource

en-comcom: English Compounds Dataset for Compositionality Tests

en-comcom: English Compounds Dataset for Compositionality Tests eng

The ENglish COMpositionality dataset containing COMpounds (en-comcom) was constructed from two existing compound datasets - the Tratz (2011) dataset and the Ó'Séaghdha (2008) dataset - and a selection of the nominal compounds in the WordNet database. The Tratz (2011) dataset contains 19158 compounds and is part of the semantically-enriched parser described in Tratz (2011) available at http://www.isi.edu/publications/licensed-sw/fanseparser/ The Ó'Séaghdha (2008) contains 1443 compounds and is available at http://www.cl.cam.ac.uk/~do242/Resources/1443_Compounds.tar.gz Additional compounds were collected from the WordNet 3.1 (Fellbaum, 1998) 'data.noun' file. The extracted list contained 18775 compounds. The combination of compounds from the three sources was additionaly pre-processed and frequency-filtered - details in Dima (2019). The final dataset has 27220 compounds. The train, test and dev splits contain 19054, 5444 and 2722 compounds. The train/test/dev files have the following format: modifier head compound (e.g. police car police_car) For results of compositionality models evaluated on this dataset see Dima (2016), Dima (2019). Dima, Corina. 2015. Reverse-engineering Language: A Study on the Semantic Compositionality of German Compounds. In Proceedings of EMNLP 2015, Lisbon, Portugal, pp. pp. 1637–1642 [Download paper: https://aclweb.org/anthology/D/D15/D15-1188.pdf] - Dima, C. 2016. On the Compositionality and Semantic Interpretation of English Noun Compounds. In Proceedings of the 1st Workshop on Representation Learning for NLP @ ACL 2016, pages 27–39, Berlin, Germany. - Dima, C. 2019. Composition Models for the Representation and Semantic Interpretation of Nominal Compounds. PhD thesis. University of Tübingen. - Fellbaum, C. 1998. WordNet. Wiley Online Library. - Ó Séaghdha, D. 2008. Learning compound noun semantics. PhD thesis, Computer Laboratory, University of Cambridge. Published as University of Cambridge Computer Laboratory Technical Report 735. - Tratz, S. 2011. Semantically-enriched parsing for natural language understanding. PhD thesis, PhD Thesis, University of Southern California. eng

2017-03-14

1

d5e97d3d-2114-4746-b737-2c8f7b15be96

8cefa5dd-f5fb-4527-8acb-88cc6824eb48

27220 items

Keine verknüpften Ressourcen sind verfügbar!
Keine verknüpften Ressourcen sind verfügbar!