TY - GEN
T1 - Adapting cross-genre author profiling to language and corpus
AU - Markov, I.
AU - Gómez-Adorno, H.
AU - Sidorov, G.
AU - Gelbukh, A.
PY - 2016
Y1 - 2016
N2 - This paper presents our approach to the Author Profiling (AP) task at PAN 2016. The task aims at identifying the author's age and gender under crossgenre AP conditions in three languages: English, Spanish, and Dutch. Our preprocessing stage includes reducing non-Textual features to their corresponding semantic classes. We exploit typed character n-grams, lexical features, and nontextual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized frequency, second order attributes (SOA), tf-idf) and machine learning algorithms (liblinear and libSVM implementations of Support Vector Machines (SVM), multinomial naive Bayes, logistic regression). For textual feature selection, we applied the transition point technique, except when SOA was used. We found that the optimal configuration was different for different languages at each stage.
AB - This paper presents our approach to the Author Profiling (AP) task at PAN 2016. The task aims at identifying the author's age and gender under crossgenre AP conditions in three languages: English, Spanish, and Dutch. Our preprocessing stage includes reducing non-Textual features to their corresponding semantic classes. We exploit typed character n-grams, lexical features, and nontextual features (domain names). We experimented with various feature representations (binary, raw frequency, normalized frequency, second order attributes (SOA), tf-idf) and machine learning algorithms (liblinear and libSVM implementations of Support Vector Machines (SVM), multinomial naive Bayes, logistic regression). For textual feature selection, we applied the transition point technique, except when SOA was used. We found that the optimal configuration was different for different languages at each stage.
M3 - Conference contribution
T3 - CEUR Workshop Proceedings
SP - 947
EP - 955
BT - CLEF 2016 - Working Notes of CLEF 2016 - Conference and Labs of the Evaluation Forum
A2 - Cappellato, L.
A2 - Ferro, N.
A2 - Macdonald, C.
A2 - Balog, K.
PB - CEUR-WS
T2 - 2016 Working Notes of Conference and Labs of the Evaluation Forum, CLEF 2016
Y2 - 5 September 2016 through 8 September 2016
ER -