Identifying disease-centric subdomains in very large medical ontologies: A case-study on breast cancer concepts in SNOMED CT. Or: Finding 2500 out of 300.000

K. Milian, Z. Aleksovski, R. Vdovjak, A.C.M. ten Teije, F.A.H. van Harmelen

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Modern medical vocabularies can contain up to hundreds of thousands of concepts. In any particular use-case only a small fraction of these will be needed. In this paper we first define two notions of a disease-centric subdomain of a large ontology. We then explore two methods for identifying disease-centric subdomains of such large medical vocabularies. The first method is based on lexically querying the ontology with an iteratively extended set of seed queries. The second method is based on manual mapping between concepts from a medical guideline document and ontology concepts. Both methods include concept-expansion over subsumption and equality relations. We use both methods to determine a breast-cancer-centric subdomain of the SNOMED CT ontology. Our experiments show that the two methods produce a considerable overlap, but they also yield a large degree of complementarity, with interesting differences between the sets of concepts that they return. Analysis of the results reveals strengths and weaknesses of the different methods.

Original languageEnglish
Title of host publicationKnowledge Representation for Health-Care: Data, Processes and Guidelines, AIME 2009, Workshop KR4HC 2009, Revised Selected and Invited Papers
EditorsD. Riano, A.C.M. ten Teije, S. Miksch, M. Peleg
Place of PublicationVerona, Italy
Pages50-63
Number of pages14
DOIs
Publication statusPublished - 2010
EventWorkshop on Knowledge Representation for Health-Care: Data, Processes and Guidelines, KR4HC 2009. Held in Conjunction with the 12th Conference on Artificial Intelligence in Medicine, AIME 2009 - Verona, Italy
Duration: 19 Jul 200919 Jul 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5943 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Conference

ConferenceWorkshop on Knowledge Representation for Health-Care: Data, Processes and Guidelines, KR4HC 2009. Held in Conjunction with the 12th Conference on Artificial Intelligence in Medicine, AIME 2009
CountryItaly
CityVerona
Period19/07/0919/07/09

Fingerprint

Breast Cancer
Ontology
Seed
Complementarity
Use Case
Concepts
Overlap
Equality
Query
Experiments
Experiment

Keywords

  • Disease related concepts
  • Identifying ontology subdomain
  • Mapping medical terminologies
  • Medical guidelines
  • Ontology subsetting
  • Seed queries

Cite this

Milian, K., Aleksovski, Z., Vdovjak, R., ten Teije, A. C. M., & van Harmelen, F. A. H. (2010). Identifying disease-centric subdomains in very large medical ontologies: A case-study on breast cancer concepts in SNOMED CT. Or: Finding 2500 out of 300.000. In D. Riano, A. C. M. ten Teije, S. Miksch, & M. Peleg (Eds.), Knowledge Representation for Health-Care: Data, Processes and Guidelines, AIME 2009, Workshop KR4HC 2009, Revised Selected and Invited Papers (pp. 50-63). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5943 LNAI). Verona, Italy. https://doi.org/10.1007/978-3-642-11808-1_5
Milian, K. ; Aleksovski, Z. ; Vdovjak, R. ; ten Teije, A.C.M. ; van Harmelen, F.A.H. / Identifying disease-centric subdomains in very large medical ontologies : A case-study on breast cancer concepts in SNOMED CT. Or: Finding 2500 out of 300.000. Knowledge Representation for Health-Care: Data, Processes and Guidelines, AIME 2009, Workshop KR4HC 2009, Revised Selected and Invited Papers. editor / D. Riano ; A.C.M. ten Teije ; S. Miksch ; M. Peleg. Verona, Italy, 2010. pp. 50-63 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{8634b9e67293479b90d66b5c54bd1110,
title = "Identifying disease-centric subdomains in very large medical ontologies: A case-study on breast cancer concepts in SNOMED CT. Or: Finding 2500 out of 300.000",
abstract = "Modern medical vocabularies can contain up to hundreds of thousands of concepts. In any particular use-case only a small fraction of these will be needed. In this paper we first define two notions of a disease-centric subdomain of a large ontology. We then explore two methods for identifying disease-centric subdomains of such large medical vocabularies. The first method is based on lexically querying the ontology with an iteratively extended set of seed queries. The second method is based on manual mapping between concepts from a medical guideline document and ontology concepts. Both methods include concept-expansion over subsumption and equality relations. We use both methods to determine a breast-cancer-centric subdomain of the SNOMED CT ontology. Our experiments show that the two methods produce a considerable overlap, but they also yield a large degree of complementarity, with interesting differences between the sets of concepts that they return. Analysis of the results reveals strengths and weaknesses of the different methods.",
keywords = "Disease related concepts, Identifying ontology subdomain, Mapping medical terminologies, Medical guidelines, Ontology subsetting, Seed queries",
author = "K. Milian and Z. Aleksovski and R. Vdovjak and {ten Teije}, A.C.M. and {van Harmelen}, F.A.H.",
year = "2010",
doi = "10.1007/978-3-642-11808-1_5",
language = "English",
isbn = "3642118070",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "50--63",
editor = "D. Riano and {ten Teije}, A.C.M. and S. Miksch and M. Peleg",
booktitle = "Knowledge Representation for Health-Care: Data, Processes and Guidelines, AIME 2009, Workshop KR4HC 2009, Revised Selected and Invited Papers",

}

Milian, K, Aleksovski, Z, Vdovjak, R, ten Teije, ACM & van Harmelen, FAH 2010, Identifying disease-centric subdomains in very large medical ontologies: A case-study on breast cancer concepts in SNOMED CT. Or: Finding 2500 out of 300.000. in D Riano, ACM ten Teije, S Miksch & M Peleg (eds), Knowledge Representation for Health-Care: Data, Processes and Guidelines, AIME 2009, Workshop KR4HC 2009, Revised Selected and Invited Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5943 LNAI, Verona, Italy, pp. 50-63, Workshop on Knowledge Representation for Health-Care: Data, Processes and Guidelines, KR4HC 2009. Held in Conjunction with the 12th Conference on Artificial Intelligence in Medicine, AIME 2009, Verona, Italy, 19/07/09. https://doi.org/10.1007/978-3-642-11808-1_5

Identifying disease-centric subdomains in very large medical ontologies : A case-study on breast cancer concepts in SNOMED CT. Or: Finding 2500 out of 300.000. / Milian, K.; Aleksovski, Z.; Vdovjak, R.; ten Teije, A.C.M.; van Harmelen, F.A.H.

Knowledge Representation for Health-Care: Data, Processes and Guidelines, AIME 2009, Workshop KR4HC 2009, Revised Selected and Invited Papers. ed. / D. Riano; A.C.M. ten Teije; S. Miksch; M. Peleg. Verona, Italy, 2010. p. 50-63 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5943 LNAI).

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Identifying disease-centric subdomains in very large medical ontologies

T2 - A case-study on breast cancer concepts in SNOMED CT. Or: Finding 2500 out of 300.000

AU - Milian, K.

AU - Aleksovski, Z.

AU - Vdovjak, R.

AU - ten Teije, A.C.M.

AU - van Harmelen, F.A.H.

PY - 2010

Y1 - 2010

N2 - Modern medical vocabularies can contain up to hundreds of thousands of concepts. In any particular use-case only a small fraction of these will be needed. In this paper we first define two notions of a disease-centric subdomain of a large ontology. We then explore two methods for identifying disease-centric subdomains of such large medical vocabularies. The first method is based on lexically querying the ontology with an iteratively extended set of seed queries. The second method is based on manual mapping between concepts from a medical guideline document and ontology concepts. Both methods include concept-expansion over subsumption and equality relations. We use both methods to determine a breast-cancer-centric subdomain of the SNOMED CT ontology. Our experiments show that the two methods produce a considerable overlap, but they also yield a large degree of complementarity, with interesting differences between the sets of concepts that they return. Analysis of the results reveals strengths and weaknesses of the different methods.

AB - Modern medical vocabularies can contain up to hundreds of thousands of concepts. In any particular use-case only a small fraction of these will be needed. In this paper we first define two notions of a disease-centric subdomain of a large ontology. We then explore two methods for identifying disease-centric subdomains of such large medical vocabularies. The first method is based on lexically querying the ontology with an iteratively extended set of seed queries. The second method is based on manual mapping between concepts from a medical guideline document and ontology concepts. Both methods include concept-expansion over subsumption and equality relations. We use both methods to determine a breast-cancer-centric subdomain of the SNOMED CT ontology. Our experiments show that the two methods produce a considerable overlap, but they also yield a large degree of complementarity, with interesting differences between the sets of concepts that they return. Analysis of the results reveals strengths and weaknesses of the different methods.

KW - Disease related concepts

KW - Identifying ontology subdomain

KW - Mapping medical terminologies

KW - Medical guidelines

KW - Ontology subsetting

KW - Seed queries

UR - http://www.scopus.com/inward/record.url?scp=77951133677&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951133677&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-11808-1_5

DO - 10.1007/978-3-642-11808-1_5

M3 - Conference contribution

SN - 3642118070

SN - 9783642118074

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 50

EP - 63

BT - Knowledge Representation for Health-Care: Data, Processes and Guidelines, AIME 2009, Workshop KR4HC 2009, Revised Selected and Invited Papers

A2 - Riano, D.

A2 - ten Teije, A.C.M.

A2 - Miksch, S.

A2 - Peleg, M.

CY - Verona, Italy

ER -

Milian K, Aleksovski Z, Vdovjak R, ten Teije ACM, van Harmelen FAH. Identifying disease-centric subdomains in very large medical ontologies: A case-study on breast cancer concepts in SNOMED CT. Or: Finding 2500 out of 300.000. In Riano D, ten Teije ACM, Miksch S, Peleg M, editors, Knowledge Representation for Health-Care: Data, Processes and Guidelines, AIME 2009, Workshop KR4HC 2009, Revised Selected and Invited Papers. Verona, Italy. 2010. p. 50-63. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-11808-1_5