Cic-IPN@INLi2018: Indian native language identification

I. Markov, G. Sidorov

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

© 2018 CEUR-WS. All Rights Reserved.In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.
Original languageEnglish
Title of host publicationFIRE-WN 2018 - Working Notes of FIRE 2018 - Forum for Information Retrieval Evaluation
EditorsP. Rosso, P. Mehta, P. Majumder, M. Mitra
PublisherCEUR-WS
Pages82-88
Volume2266
Publication statusPublished - 2018
Externally publishedYes
Event10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018 - Gandhinagar, India
Duration: 6 Dec 20189 Dec 2018

Publication series

NameCEUR Workshop Proceedings
ISSN (Print)1613-0073

Conference

Conference10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018
Country/TerritoryIndia
CityGandhinagar
Period6/12/189/12/18

Fingerprint

Dive into the research topics of 'Cic-IPN@INLi2018: Indian native language identification'. Together they form a unique fingerprint.

Cite this