ReMeDi: Resources for Multi-domain, Multi-service, Medical Dialogues

Guojun Yan, Jiahuan Pei, Pengjie Ren, Zhaochun Ren, Xin Xin, Huasheng Liang, Maarten De Rijke, Zhumin Chen

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

\AcpMDS aim to assist doctors and patients with a range of professional medical services, i.e., diagnosis, treatment and consultation. The development of \acpMDS is hindered because of a lack of resources. In particular. \beginenumerate∗[label=(\arabic∗) ] \item there is no dataset with large-scale medical dialogues that covers multiple medical services and contains fine-grained medical labels (i.e., intents, actions, slots, values), and \item there is no set of established benchmarks for \acpMDS for multi-domain, multi-service medical dialogues. \endenumerate∗ In this paper, we present \acsReMeDi, a set of \aclReMeDi \acusedReMeDi. ØurResources consists of two parts, the ØurResources dataset and the ØurResources benchmarks. The ØurResources dataset contains 96,965 conversations between doctors and patients, including 1,557 conversations with fine-gained labels. It covers 843 types of diseases, 5,228 medical entities, and 3 specialties of medical services across 40 domains. To the best of our knowledge, the ØurResources dataset is the only medical dialogue dataset that covers multiple domains and services, and has fine-grained medical labels. The second part of the ØurResources resources consists of a set of state-of-the-art models for (medical) dialogue generation. The ØurResources benchmark has the following methods: \beginenumerate∗\item pretrained models (i.e., BERT-WWM, BERT-MED, GPT2, and MT5) trained, validated, and tested on the ØurResources dataset, and \item a \acfSCL method to expand the ØurResources dataset and enhance the training of the state-of-the-art pretrained models. \endenumerate∗ We describe the creation of the ØurResources dataset, the ØurResources benchmarking methods, and establish experimental results using the ØurResources benchmarking methods on the ØurResources dataset for future research to compare against. With this paper, we share the dataset, implementations of the benchmarks, and evaluation scripts.
Original languageEnglish
Title of host publicationSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages3013-3024
ISBN (Electronic)9781450387323
DOIs
Publication statusPublished - 6 Jul 2022
Externally publishedYes
Event45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022 - Madrid, Spain
Duration: 11 Jul 202215 Jul 2022

Conference

Conference45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022
Country/TerritorySpain
CityMadrid
Period11/07/2215/07/22

Funding

This work is supported by the Natural Science Foundation of China (61902219, 61972234, 62072279, 62102234), the Natural Science Foundation of Shandong Province (ZR2021QF129), the Key Scientific and Technological Innovation Program of Shandong Province (2019JZ ZY010129), Shandong University multidisciplinary research and innovation team of young scholars (No. 2020QNQT017), the Hybrid Intelligence Center, a 10-year program funded by the Dutch Ministry of Education, Culture and Science through the Netherlands Organisation for Scientific Research, https://hybrid-intelligence-centre.nl, the Tencent WeChat Rhino-Bird Focused Research Program (JR-WXG-2021411), Meituan. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

FundersFunder number
Shandong University multidisciplinary research and innovation team of young scholars2020QNQT017
Tencent WeChat Rhino-Bird Focused Research ProgramJR-WXG-2021411
National Natural Science Foundation of China61972234, 61902219, 62072279, 62102234
Ministerie van Onderwijs, Cultuur en Wetenschap
Nederlandse Organisatie voor Wetenschappelijk Onderzoek
Natural Science Foundation of Shandong ProvinceZR2021QF129
Major Scientific and Technological Innovation Project of Shandong Province2019JZ ZY010129

    Fingerprint

    Dive into the research topics of 'ReMeDi: Resources for Multi-domain, Multi-service, Medical Dialogues'. Together they form a unique fingerprint.

    Cite this