Age Inference on Twitter using SAGE and TF-IGM

Joran Cornelisse, Reshmi Gopalakrishna Pillai

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Social media is increasingly influential in day-to-day life. People are more than ever sharing, posting, liking, and following different activities on disparate social media. Deriving specific attributes of users based on their online behavior is a growing research field. In this study, a novel methodology is proposed for determining the age of Twitter users. We classify three separate age groups, namely, 18 - 24, 25 - 54, 55 >. We compute numerous linguistic features from the tweets of users, obtain significant terms extracted by the SAGE algorithms, and retrieve relevant meta-data of users by extracting information on their followed interests on Twitter using TF-IGM. The final logistic regression model obtains a macro F1-score of 78%. This way, effectively combining NLP and IR techniques for attribute inference on social media.
Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2020
PublisherAssociation for Computing Machinery
Pages24-30
ISBN (Electronic)9781450377607
DOIs
Publication statusPublished - 18 Dec 2020
Externally publishedYes
Event4th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2020 - Virtual, Online, Korea, Republic of
Duration: 18 Dec 202020 Dec 2020

Conference

Conference4th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2020
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period18/12/2020/12/20

Fingerprint

Dive into the research topics of 'Age Inference on Twitter using SAGE and TF-IGM'. Together they form a unique fingerprint.

Cite this