TY - JOUR
T1 - Googling Politics? Comparing Five Computational Methods to Identify Political and News-related Searches from Web Browser Histories
AU - van Hoof, Marieke
AU - Trilling, Damian
AU - Meppelink, Corine
AU - Möller, Judith
AU - Loecherbach, Felicia
N1 - Publisher Copyright:
© 2024 The Author(s). Published with license by Taylor & Francis Group, LLC.
PY - 2025
Y1 - 2025
N2 - Search engines play a crucial role in today’s information environment. Yet, political and news-related (PNR) search engine use remains understudied, mainly due to the lack of suitable measurement methods to identify PNR searches. Existing research focuses on specific events, topics, or news articles, neglecting the broader scope of PNR search. Furthermore, self-reporting issues have led researchers to use browsing history data, but scalable methods for analyzing such data are limited. This paper addresses these gaps by comparing five computational methods to identify PNR searches in browsing data, including browsing sequences, context-enhanced dictionary, Traditional Supervised Machine Learning (SML), Transformer-based SML, and zero-shot classification. Using Dutch Google searches as a test case, we use Dutch browsing history data obtained via data donations in May 2022 linked to surveys (Nusers = 315; Nrecords = 9,868,209; Nsearches = 697,359), along with 35.5k manually annotated search terms. The findings highlight substantial variation in accuracy, with some methods being more suited for narrower topics. We recommend a two-step approach, applying zero-shot classification followed by human evaluation. This methodology can inform future empirical research on PNR search engine use.
AB - Search engines play a crucial role in today’s information environment. Yet, political and news-related (PNR) search engine use remains understudied, mainly due to the lack of suitable measurement methods to identify PNR searches. Existing research focuses on specific events, topics, or news articles, neglecting the broader scope of PNR search. Furthermore, self-reporting issues have led researchers to use browsing history data, but scalable methods for analyzing such data are limited. This paper addresses these gaps by comparing five computational methods to identify PNR searches in browsing data, including browsing sequences, context-enhanced dictionary, Traditional Supervised Machine Learning (SML), Transformer-based SML, and zero-shot classification. Using Dutch Google searches as a test case, we use Dutch browsing history data obtained via data donations in May 2022 linked to surveys (Nusers = 315; Nrecords = 9,868,209; Nsearches = 697,359), along with 35.5k manually annotated search terms. The findings highlight substantial variation in accuracy, with some methods being more suited for narrower topics. We recommend a two-step approach, applying zero-shot classification followed by human evaluation. This methodology can inform future empirical research on PNR search engine use.
UR - http://www.scopus.com/inward/record.url?scp=86000426306&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=86000426306&partnerID=8YFLogxK
U2 - 10.1080/19312458.2024.2363776
DO - 10.1080/19312458.2024.2363776
M3 - Article
AN - SCOPUS:86000426306
SN - 1931-2458
VL - 19
SP - 63
EP - 89
JO - Communication Methods and Measures
JF - Communication Methods and Measures
IS - 1
ER -