TY - GEN
T1 - Extraction of semantic relations from medical literature based on semantic predicates and SVM
AU - Zhao, Xiaoli
AU - Lin, Shaofu
AU - Huang, Zhisheng
PY - 2018
Y1 - 2018
N2 - The relationship of biomedical entity is the cornerstone of acquiring biomedical knowledge. It is of great significance to the construction of related databases in the biomedical field and the management of medical literature. How to quickly and accurately extract the required relationships of biomedical entity from massive unstructured literature is an important research. In order to improve accuracy, we use support vector machine (SVM) which is a machine learning algorithm based on feature vectors to extract relationships of entities. We extract the five main relationships in medical literature, including ISA, PART_OF, CAUSES, TREATS and DIAGNOSES. First of all, related topics are used to search medical literature from PubMed database, such as disease-drug, cause-disease. These documents are used as experimental data and then processed to form a corpus. In selection of features, the method of information gain is used to select the influential entities’ own features and entities’ context features. On this basis, semantic predicates are added as a feature to improve accuracy. The experimental results show that the accuracy of extraction is increased by 5%–10%. In the end, Resource Description Framework (RDF) is used to store extracted relationships from the corresponding documents, and it provides support for the subsequent retrieval of related documents.
AB - The relationship of biomedical entity is the cornerstone of acquiring biomedical knowledge. It is of great significance to the construction of related databases in the biomedical field and the management of medical literature. How to quickly and accurately extract the required relationships of biomedical entity from massive unstructured literature is an important research. In order to improve accuracy, we use support vector machine (SVM) which is a machine learning algorithm based on feature vectors to extract relationships of entities. We extract the five main relationships in medical literature, including ISA, PART_OF, CAUSES, TREATS and DIAGNOSES. First of all, related topics are used to search medical literature from PubMed database, such as disease-drug, cause-disease. These documents are used as experimental data and then processed to form a corpus. In selection of features, the method of information gain is used to select the influential entities’ own features and entities’ context features. On this basis, semantic predicates are added as a feature to improve accuracy. The experimental results show that the accuracy of extraction is increased by 5%–10%. In the end, Resource Description Framework (RDF) is used to store extracted relationships from the corresponding documents, and it provides support for the subsequent retrieval of related documents.
KW - Multi-classification
KW - RDF
KW - Relation extraction
KW - Semantic technology
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=85057331568&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057331568&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01078-2_2
DO - 10.1007/978-3-030-01078-2_2
M3 - Conference contribution
AN - SCOPUS:85057331568
SN - 9783030010775
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 17
EP - 24
BT - Health Information Science
A2 - Siuly, Siuly
A2 - Lee, Ickjai
A2 - Huang, Zhisheng
A2 - Zhou, Rui
A2 - Wang, Hua
A2 - Xiang, Wei
PB - Springer - Verlag
T2 - 7th International Conference on Health Information Science, HIS 2018
Y2 - 5 October 2018 through 7 October 2018
ER -