Syntactic parsing of clause constituents for statistical machine translation

Jianjun Ma, Jiahuan Pei, Degen Huang, Dingxin Song

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

The clause is considered as the basic unit of grammar in linguistics, which is a structure between a chunk and a sentence. Clause constituents, therefore, are one important kind of linguistically valid syntactic phrases. This paper adopts the CRFs model to recognise English clause constituents with their syntactic functions, and testifies their effect on machine translation by applying this syntactic information to an English-Chinese PBSMT system, evaluated on a corpus of business domain. Clause constituents are mainly classified into six kinds: subject, predicate, complement, adjunct, residues of predicate, and residues of complement. Results show that our rich-feature CRFs model achieves an F-measure of 93.31%, a precision of 93.26%, and a recall of 93.04%. This syntactic knowledge in the source language is further combined with the NiuTrans phrasal SMT system, which slightly improves the English-Chinese translation accuracy.
Original languageEnglish
Pages (from-to)126-132
JournalInternational Journal of Computational Science and Engineering
Volume17
Issue number1
DOIs
Publication statusPublished - 2018
Externally publishedYes

Funding

Jiahuan Pei is a PhD candidate student of the School of Computer Science and Technology, Dalian University of Technology (DUT). She received her BSc from Dalian Maritime University in 2014. Currently, she is taking a successive postgraduate and doctoral program at DUT. Her research interests include natural language processing and machine translation. Recently her main research interests include sentence similarity computation, temporal intent understanding, keyword extraction and English-Chinese machine translation. She has been taking part in several projects funded by the National Natural Science Foundation of China (NSFC), and the National Social Science Foundation of China. Dingxin Song is a PhD candidate student of the School of Computer Science and Technology, Dalian University of Technology (DUT). He received his BSc from Shaanxi Normal University in 2009. Currently, he is taking a successive postgraduate and doctoral program at DUT. His research interests include machine learning and machine translation. Recently, he is focusing on neural network algorithm for natural language processing applications and statistical machine translation. He has been taking part in several projects funded by the National Natural Science Foundation of China (NSFC), the National Social Science Foundation of China, and the Humanities and Social Science Research Projects in Ministry of Education, China. Biographical notes: Jianjun Ma is a Professor of the School of Foreign Languages, Dalian University of Technology (DUT). She graduated from DUT with a BA degree in English for Science and Technology in 1993. Then she received her MA degree in Applied Linguistics from Cardiff University in UK in 2002, and PhD in Applied Computer Technology from DUT in 2012. She has been working as an English teacher in DUT since 1993. Her research interests include parsing and machine translation. Recently, she is focusing on noun phrase chunking, clause identification and automatic analysis of clause complex. Her research projects are funded by the National Social Science Foundation of China, and the Humanities and Social Science Research Projects in Ministry of Education, China. This work was supported by Humanities and Social Science Research Projects in Ministry of Education, China (No. 13YJAZH062).

FundersFunder number
National Natural Science Foundation of China
Ministry of Education of the People's Republic of China13YJAZH062
National Office for Philosophy and Social Sciences

    Fingerprint

    Dive into the research topics of 'Syntactic parsing of clause constituents for statistical machine translation'. Together they form a unique fingerprint.

    Cite this