--------------------------------------------------------------------------------------------------- Data sets by Iryan Gurevych and Torsten Zesch, Darmstadt University of Technology ---------------------------------------------------------------------------------------------------- Gur65 dataset This dataset contains 65 word pairs along with their similarity scores assigned on a discrete 0-4 scale by 24 subjects. The inter-annotator agreement is 0.81. This dataset is a German translation of the Rubenstein/Goodenough dataset [1]. The judgment values were not adopted from their work, but newly annotated. The dataset is described in "Using the Structure of a Conceptual Network in Computing Semantic Relatedness" In: Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’2005), Jeju Island, Republic of Korea, October 11 - 13. (to appear), 2005. Gur350 dataset This dataset contains 350 word pairs along with their relatedness scores assigned on a discrete 0-4 scale by 8 subjects. The inter-annotator agreement is 0.69. ZG222 dataset This dataset contains 222 word pairs along with their relatedness scores assigned on a discrete 0-4 by 21 subjects. The inter-annotator agreement is 0.49. The dataset is described in "Automatically creating datasets for measures of semantic relatedness" In: COLING/ACL 2006 Workshop on Linguistic Distances. pp. 16-24, 2006. *-NN datasets DERIVED from the original data sets by Gurevych and Zesch retaining only pairs that consist of two nouns. ----------------------------------------------------------------------------------------------------- Found on http://www.ukp.tu-darmstadt.de/data/semRelDatasets Retrieved from http://www.ukp.tu-darmstadt.de/sites/www.ukp.tu-darmstadt.de/files/datasets.zip -----------------------------------------------------------------------------------------------------