This resource consists of four folders:

The gold_standard folder provides the files consisting of manually evaluated triples. 
The files were exported from ANNit with the column:
LEFT* for the source URI of the triple.
RIGHT* for the target URI of the triple.
UserChioce for the choice of user when manually evaluated
Decision* for the actual decision made by annotator. It can only be unknown, remove, remain.
Comment, if any.

The only three columns that matter for the evaluation of removed triples in this project are those labelled with *.

The folder graph_file includes the unweighted graphs, as well as the two sets of weighted graphs: the graphs with counted weights and the graphs with inferred weights (in the subdirectory of counted_weights and inferred_weights subdirectory respectively). 
The files are compressed in the format of *.gz. Each file consists of two columns of integers as the source and the target. The integers corresponds to the URIs. The corresponding mapping files are in the directory mapping.

Finally, the corresponding files (of unweighted graphs) in WebGraph format are provided. These files were used when evaluating our algorithm against exiting web-scale feedback-arc-set algorithm.


Should there be any problem with these datasets, please feel free to report to us at the following email address:

Date made available2020

Cite this