Biomedical dataset recommendation

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review


Dataset search is a special application of information retrieval, which aims to help scientists with finding the datasets they want. Current dataset search engines are query-driven, which implies that the results are limited by the ability of the user to formulate the appropriate query. In this paper we aim to solve this limitation by framing dataset search as a recommendation task: given a dataset by the user, the search engine recommends similar datasets. We solve this dataset recommendation task using a similarity approach. We provide a simple benchmark task to evaluate different approaches for this dataset recommendation task. We also evaluate the recommendation task with several similarity approaches in the biomedical domain. We benchmark 8 different similarity metrics between datasets, including both ontology-based techniques and techniques from machine learning. Our results show that the task of recommending scientific datasets based on meta-data as it occurs in realistic dataset collections is a hard task. None of the ontology-based methods manage to perform well on this task, and are outscored by the majority of the machine-learning methods. Of these ML methods only one of the approaches performs reasonably well, and even then only reaches 70% accuracy.
Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Data Science, Technology and Applications
EditorsChristoph Quix, Slimane Hammoudi, Wil van der Aalst
Number of pages8
ISBN (Electronic)9789897585210
Publication statusPublished - 2021
Event10th International Conference on Data Science, Technology and Applications, DATA 2021 - Virtual, Online
Duration: 6 Jul 20218 Jul 2021


Conference10th International Conference on Data Science, Technology and Applications, DATA 2021
CityVirtual, Online


Dive into the research topics of 'Biomedical dataset recommendation'. Together they form a unique fingerprint.

Cite this