Abstract
Users searching for different topics in a collection may show distinct search patterns. To analyze search behavior of users searching for a specific topic, we need to retrieve the sessions containing this topic. In this paper, we compare different topic representations and approaches to find topic-specific sessions. We conduct our research in a double case study of two topics, World War II and feminism, using search logs of a historical newspaper collection. We evaluate the results using manually created ground truths of over 600 sessions per topic. The two case studies show similar results: The query-based methods yield high precision, at the expense of recall. The document-based methods find more sessions, at the expense of precision. In both approaches, precision improves significantly by manually curating the topic representations. This study demonstrates how different methods to find sessions containing specific topics can be applied by digital humanities scholars and practitioners.
Original language | English |
---|---|
Title of host publication | Linking Theory and Practice of Digital Libraries |
Subtitle of host publication | 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021, Virtual Event, September 13–17, 2021, Proceedings |
Editors | Gerd Berget, Mark Michael Hall, Daniel Brenn, Sanna Kumpulainen |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 189-201 |
Number of pages | 13 |
ISBN (Electronic) | 9783030863241 |
ISBN (Print) | 9783030863234 |
DOIs | |
Publication status | Published - 2021 |
Event | 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021 - Virtual, Online Duration: 13 Sept 2021 → 17 Sept 2021 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12866 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021 |
---|---|
City | Virtual, Online |
Period | 13/09/21 → 17/09/21 |
Bibliographical note
Funding Information:We would like to thank the National Library of the Netherlands, and Lynda Hardman (Centrum Wiskunde & Informatica) for their support. The Wikipedia articles related to WWII were assessed by Kees Ribbens with the assistance of Caroline Schoofs and Koen Smilde. This research is partially supported by the VRE4EIC project, a project that has received funding from the European Union?s Horizon 2020 research and innovation program under grant agreement No 676247.
Funding Information:
Acknowledgements. We would like to thank the National Library of the Netherlands, and Lynda Hardman (Centrum Wiskunde & Informatica) for their support. The Wikipedia articles related to WWII were assessed by Kees Ribbens with the assistance of Caroline Schoofs and Koen Smilde. This research is partially supported by the VRE4EIC project, a project that has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 676247.
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
Funding
We would like to thank the National Library of the Netherlands, and Lynda Hardman (Centrum Wiskunde & Informatica) for their support. The Wikipedia articles related to WWII were assessed by Kees Ribbens with the assistance of Caroline Schoofs and Koen Smilde. This research is partially supported by the VRE4EIC project, a project that has received funding from the European Union?s Horizon 2020 research and innovation program under grant agreement No 676247. Acknowledgements. We would like to thank the National Library of the Netherlands, and Lynda Hardman (Centrum Wiskunde & Informatica) for their support. The Wikipedia articles related to WWII were assessed by Kees Ribbens with the assistance of Caroline Schoofs and Koen Smilde. This research is partially supported by the VRE4EIC project, a project that has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 676247.
Keywords
- Digital libraries
- Log analysis
- User interests