Graph neural networks and other machine learning models offer a promising direction for interpretable machine learning on relational and multimodal data. Until now, however, progress in this area is difficult to gauge. This is primarily due to a limited number of datasets with (a) a high enough number of labeled nodes in the test set for precise measurement of performance, and (b) a rich enough variety of of multimodal information to learn from. Here, we introduce a set of new benchmark tasks for node classification on knowledge graphs. We focus primarily on node classification, since this setting cannot be solved purely by node embedding models, instead requiring the model to pool information from several steps away in the graph. However, the datasets may also be used for link prediction. For each dataset, we provide test and validation sets of at least 1000 instances, with some containing more than 10\;000 instances. Each task can be performed in a purely relational manner, to evaluate the performance of a relational graph model in isolation, or with multimodal information, to evaluate the performance of multimodal relational graph models. All datasets are packaged in a CSV format that is easily consumable in any machine learning environment, together with the original source data in RDF and pre-processing code for full provenance. We provide code for loading the data into \texttt{numpy} and \texttt{pytorch}. We compute performance for several baseline models.
Date made available19 Dec 2020

Cite this