TY - GEN
T1 - Language- and subtask-dependent feature selection and classifier parameter tuning for author Profiling
T2 - 18th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2017
AU - Markov, I.
AU - Gómez-Adorno, H.
AU - Sidorov, G.
PY - 2017
Y1 - 2017
N2 - We present the CIC's approach to the Author Profiling (AP) task at PAN 2017. This year task consists of two subtasks: gender and language variety identification in English, Spanish, Portuguese, and Arabic. We use typed and untyped character n-grams, word n-grams, and non-textual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized frequency, log-entropy weighting, tf-idf), machine-learning algorithms (liblinear and libSVM implementations of Support Vector Machines (SVM), multinomial naive Bayes, ensemble classifier, meta-classifiers), and frequency threshold values. We adjusted system configurations for each of the languages and subtasks.
AB - We present the CIC's approach to the Author Profiling (AP) task at PAN 2017. This year task consists of two subtasks: gender and language variety identification in English, Spanish, Portuguese, and Arabic. We use typed and untyped character n-grams, word n-grams, and non-textual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized frequency, log-entropy weighting, tf-idf), machine-learning algorithms (liblinear and libSVM implementations of Support Vector Machines (SVM), multinomial naive Bayes, ensemble classifier, meta-classifiers), and frequency threshold values. We adjusted system configurations for each of the languages and subtasks.
M3 - Conference contribution
VL - 1866
T3 - CEUR Workshop Proceedings
BT - CLEF 2017 - Working Notes of CLEF 2017 Conference and Labs of the Evaluation Forum
A2 - Mandl, T.
A2 - Ferro, N.
A2 - Goeuriot, L.
A2 - Cappellato, L.
PB - CEUR-WS
Y2 - 11 September 2017 through 14 September 2017
ER -