Abstract
Social media is increasingly influential in day-to-day life. People are more than ever sharing, posting, liking, and following different activities on disparate social media. Deriving specific attributes of users based on their online behavior is a growing research field. In this study, a novel methodology is proposed for determining the age of Twitter users. We classify three separate age groups, namely, 18 - 24, 25 - 54, 55 >. We compute numerous linguistic features from the tweets of users, obtain significant terms extracted by the SAGE algorithms, and retrieve relevant meta-data of users by extracting information on their followed interests on Twitter using TF-IGM. The final logistic regression model obtains a macro F1-score of 78%. This way, effectively combining NLP and IR techniques for attribute inference on social media.
Original language | English |
---|---|
Title of host publication | Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2020 |
Publisher | Association for Computing Machinery |
Pages | 24-30 |
ISBN (Electronic) | 9781450377607 |
DOIs | |
Publication status | Published - 18 Dec 2020 |
Externally published | Yes |
Event | 4th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2020 - Virtual, Online, Korea, Republic of Duration: 18 Dec 2020 → 20 Dec 2020 |
Conference
Conference | 4th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2020 |
---|---|
Country/Territory | Korea, Republic of |
City | Virtual, Online |
Period | 18/12/20 → 20/12/20 |