Improving cross-topic authorship attribution: The role of pre-processing

I. Markov, E. Stamatatos, G. Sidorov

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

© Springer Nature Switzerland AG 2018.The effectiveness of character n-gram features for representing the stylistic properties of a text has been demonstrated in various independent Authorship Attribution (AA) studies. Moreover, it has been shown that some categories of character n-grams perform better than others both under single and cross-topic AA conditions. In this work, we present an improved algorithm for cross-topic AA. We demonstrate that the effectiveness of character n-grams representation can be significantly enhanced by performing simple pre-processing steps and appropriately tuning the number of features, especially in cross-topic conditions.
Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 18th International Conference, CICLing 2017, Revised Selected Papers
EditorsA. Gelbukh
PublisherSpringer Verlag
Pages289-302
ISBN (Print)9783319771151
DOIs
Publication statusPublished - 2018
Externally publishedYes
Event18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017 - Budapest, Hungary
Duration: 17 Apr 201723 Apr 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017
Country/TerritoryHungary
CityBudapest
Period17/04/1723/04/17

Funding

Acknowledgments. This work was partially supported by the Mexican Government (CONACYT projects 240844, SNI, COFAA-IPN, SIP-IPN 20161947, 20161958, 20162204, 20162064, 20171813, 20171344, and 20172008). This work was partially supported by the Mexican Government (CONACYT projects 240844, SNI, COFAA-IPN, SIP-IPN 20161947, 20161958, 20162204, 20162064, 20171813, 20171344, and 20172008).

FundersFunder number
Mexican Government
Instituto Politécnico Nacional20161947
Consejo Nacional de Ciencia y Tecnología240844
Secretaría de Investigación y Posgrado, Instituto Politécnico Nacional
Comisión de Operación y Fomento de Actividades Académicas, Instituto Politécnico Nacional
Sistema Nacional de Investigadores20162204, 20171344, 20172008, 20162064, 20171813, 20161958, SIP-IPN 20161947

    Fingerprint

    Dive into the research topics of 'Improving cross-topic authorship attribution: The role of pre-processing'. Together they form a unique fingerprint.

    Cite this