Language- and subtask-dependent feature selection and classifier parameter tuning for author Profiling: Notebook for PAN at CLEF 2017

I. Markov, H. Gómez-Adorno, G. Sidorov

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

We present the CIC's approach to the Author Profiling (AP) task at PAN 2017. This year task consists of two subtasks: gender and language variety identification in English, Spanish, Portuguese, and Arabic. We use typed and untyped character n-grams, word n-grams, and non-textual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized frequency, log-entropy weighting, tf-idf), machine-learning algorithms (liblinear and libSVM implementations of Support Vector Machines (SVM), multinomial naive Bayes, ensemble classifier, meta-classifiers), and frequency threshold values. We adjusted system configurations for each of the languages and subtasks.
Original languageEnglish
Title of host publicationCLEF 2017 - Working Notes of CLEF 2017 Conference and Labs of the Evaluation Forum
EditorsT. Mandl, N. Ferro, L. Goeuriot, L. Cappellato
PublisherCEUR-WS
Volume1866
Publication statusPublished - 2017
Externally publishedYes
Event18th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2017 - Dublin, Ireland
Duration: 11 Sept 201714 Sept 2017

Publication series

NameCEUR Workshop Proceedings
ISSN (Print)1613-0073

Conference

Conference18th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2017
Country/TerritoryIreland
CityDublin
Period11/09/1714/09/17

Funding

This work was partially supported by the Mexican Government (CONACYT projects 240844, SNI, COFAA-IPN, SIP-IPN 20162204, 20162064, 20171813, 20171344, and 20172008).

FundersFunder number
Mexican Government
Consejo Nacional de Ciencia y Tecnología240844
Sistema Nacional de InvestigadoresSIP-IPN 20162204, 20171344, 20172008, 20162064, 20171813

    Fingerprint

    Dive into the research topics of 'Language- and subtask-dependent feature selection and classifier parameter tuning for author Profiling: Notebook for PAN at CLEF 2017'. Together they form a unique fingerprint.

    Cite this