TY - JOUR
T1 - Articles with impact
T2 - Insights into 10 years of research with machine learning
AU - van der Zwaard, Stephan
AU - de Leeuw, Arie Willem
AU - Meerhoff, L. A.
AU - Bodine, Sue C.
AU - Knobbe, Arno
PY - 2020/10
Y1 - 2020/10
N2 - Worldwide scientific output is growing faster and faster. Academics should not only publish much and fast, but also publish research with impact. The aim of this study is to use machine learning to investigate characteristics of articles that were published in the Journal of Applied Physiology between 2009 and 2018, and characterize high-impact articles. Article impact was assessed for 4,531 publications by three common impact metrics: the Altmetric Attention Scores, downloads, and citations. Additionally, a broad collection of (more than 200) characteristics was collected from the article's title, abstract, authors, keywords, publication, and article engagement. We constructed random forest (RF) regression models to predict article impact and articles with the highest impact (top-25% and top-10% for each impact metric), which were compared with a naive baseline method. RF models outperformed the baseline models when predicting the impact of unseen articles (P < 0.001 for each impact metric). Also, RF models predicted top-25% and top-10% high-impact articles with a high accuracy. Moreover, RF models revealed important article characteristics. Higher impact was observed for articles about exercise, training, performance and VO2 m a x, reviews, human studies, articles from large collaborations, longer articles with many references and high engagement by scientists, practitioners and public or via news outlets and videos. Lower impact was shown for articles about respiratory physiology or sleep apnea, editorials, animal studies, and titles with a question mark or a reference to places or individuals. In summary, research impact can be predicted and better understood using a combination of article characteristics and machine learning.
AB - Worldwide scientific output is growing faster and faster. Academics should not only publish much and fast, but also publish research with impact. The aim of this study is to use machine learning to investigate characteristics of articles that were published in the Journal of Applied Physiology between 2009 and 2018, and characterize high-impact articles. Article impact was assessed for 4,531 publications by three common impact metrics: the Altmetric Attention Scores, downloads, and citations. Additionally, a broad collection of (more than 200) characteristics was collected from the article's title, abstract, authors, keywords, publication, and article engagement. We constructed random forest (RF) regression models to predict article impact and articles with the highest impact (top-25% and top-10% for each impact metric), which were compared with a naive baseline method. RF models outperformed the baseline models when predicting the impact of unseen articles (P < 0.001 for each impact metric). Also, RF models predicted top-25% and top-10% high-impact articles with a high accuracy. Moreover, RF models revealed important article characteristics. Higher impact was observed for articles about exercise, training, performance and VO2 m a x, reviews, human studies, articles from large collaborations, longer articles with many references and high engagement by scientists, practitioners and public or via news outlets and videos. Lower impact was shown for articles about respiratory physiology or sleep apnea, editorials, animal studies, and titles with a question mark or a reference to places or individuals. In summary, research impact can be predicted and better understood using a combination of article characteristics and machine learning.
KW - Altmetrics
KW - Bibliometrics
KW - Machine learning
KW - Natural language processing
KW - Scientometrics
UR - http://www.scopus.com/inward/record.url?scp=85092802763&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092802763&partnerID=8YFLogxK
U2 - 10.1152/japplphysiol.00489.2020
DO - 10.1152/japplphysiol.00489.2020
M3 - Article
C2 - 32790596
AN - SCOPUS:85092802763
SN - 8750-7587
VL - 129
SP - 967
EP - 979
JO - Journal of Applied Physiology
JF - Journal of Applied Physiology
IS - 4
ER -