Overview of the cross-domain authorship verification task at PAN 2021

  • M. Kestemont
  • , E. Manjavacas
  • , I. Markov
  • , J. Bevendorff
  • , M. Wiegmann
  • , E. Stamatatos
  • , B. Stein
  • , M. Potthast

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Idiosyncrasies in human writing styles make it difficult to develop systems for authorship identification that scale well across individuals. In this year's edition of PAN, the authorship identification track focused on open-set authorship verification, so that systems are applied to unknown documents by previously unseen authors in a new domain. As in the previous year, the sizable materials for this campaign were sampled from English-language fanfiction. The calibration materials handed out to the participants were the same as last year, but a new test set was compiled with authors and fandom domains not present in any of the previous datasets. The general setup of the task did not change, i.e., systems still had to estimate the probability of a pair of documents being authored by the same person. We attracted 13 submissions by 10 international teams, which were compared to three complementary baselines, using five diverse evaluation metrics. Post-hoc analyses show that systems benefitted from the abundant calibration materials and were well-equipped to handle the open-set scenario: Both the top-performing approach and the highly competitive cohort of runner-ups presented surprisingly strong verifiers. We conclude that, at least within this specific text variety, (large-scale) open-set authorship verification is not necessarily or inherently more difficult than a closed-set setup, which offers encouraging perspectives for the future of the field.
Original languageEnglish
Title of host publicationCLEF-WN 2021 - CLEF 2021 Working Notes
Subtitle of host publicationProceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum. Bucharest, Romania, September 21st to 24th, 2021
EditorsG. Faggioli, N. Ferro, A. Joly, M. Maistro, F. Piroi
PublisherCEUR Workshop Proceedings
Pages1743-1759
Number of pages17
Publication statusPublished - 2021
Externally publishedYes
Event2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021 - Virtual, Bucharest, Romania
Duration: 21 Sept 202124 Sept 2021

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR-WS
Volume2936
ISSN (Print)1613-0073

Conference

Conference2021 Working Notes of CLEF - Conference and Labs of the Evaluation Forum, CLEF-WN 2021
Country/TerritoryRomania
CityVirtual, Bucharest
Period21/09/2124/09/21

Bibliographical note

© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Funding

As in previous years, this initiative would not have been possible without the generous contributions of the participating teams, whose patience and enthusiasm we wish to acknowledge in what has been an unusually trying edition. Our thanks also go to the CLEF organizers for the continuation of their hard annual work. Finally, we would like to extend our appreciation to Sebastian Bischoff, Niklas Deckers, Marcel Schliebs, and Ben Thies for assembling the fanfiction.net corpus.

Fingerprint

Dive into the research topics of 'Overview of the cross-domain authorship verification task at PAN 2021'. Together they form a unique fingerprint.

Cite this