---------------------------------------------------------------------------------------------------
Data sets by Iryan Gurevych and Torsten Zesch,
Darmstadt University of Technology
----------------------------------------------------------------------------------------------------

Gur65 dataset

This dataset contains 65 word pairs along with their similarity scores
assigned on a discrete 0-4 scale by 24 subjects.
The inter-annotator agreement is 0.81.
This dataset is a German translation of the Rubenstein/Goodenough dataset [1].
The judgment values were not adopted from their work, but newly annotated.
The dataset is described in

"Using the Structure of a Conceptual Network in Computing Semantic Relatedness"
In: Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’2005), 
Jeju Island, Republic of Korea, October 11 - 13. (to appear), 2005.


Gur350 dataset

This dataset contains 350 word pairs along with their relatedness scores
assigned on a discrete 0-4 scale by 8 subjects.
The inter-annotator agreement is 0.69.


ZG222 dataset

This dataset contains 222 word pairs along with their relatedness scores
assigned on a discrete 0-4 by 21 subjects.
The inter-annotator agreement is 0.49.
The dataset is described in 

"Automatically creating datasets for measures of semantic relatedness" 
In: COLING/ACL 2006 Workshop on Linguistic Distances. pp. 16-24, 2006.


*-NN datasets

DERIVED from the original data sets by Gurevych and Zesch retaining
only pairs that consist of two nouns.

-----------------------------------------------------------------------------------------------------
Found on http://www.ukp.tu-darmstadt.de/data/semRelDatasets
Retrieved from http://www.ukp.tu-darmstadt.de/sites/www.ukp.tu-darmstadt.de/files/datasets.zip

-----------------------------------------------------------------------------------------------------