TY - JOUR
T1 - Large-scale empirical study of important features indicative of discovered vulnerabilities to assess application security
AU - Zhang, Mengyuan
AU - De Carne De Carnavalet, Xavier
AU - Wang, Lingyu
AU - Ragab, Ahmed
PY - 2019/9/1
Y1 - 2019/9/1
N2 - Existing research on vulnerability discovery models shows that the existence of vulnerabilities inside an application may be linked to certain features, e.g., size or complexity, of that application. However, the applicability of such features to demonstrate the relative security between two applications is not well studied, which may depend on multiple factors in a complex way. In this paper, we perform the first large-scale empirical study of the correlation between various features of applications and the abundance of vulnerabilities. Unlike existing work, which typically focuses on one particular application, resulting in limited successes, we focus on the more realistic issue of assessing the relative security level among different applications. To the best of our knowledge, this is the most comprehensive study of 780 real-world applications involving 6498 vulnerabilities. We apply seven feature selection methods to nine feature subsets selected among 34 collected features, which are then fed into six types of machine learning models, producing 523 estimations. The predictive power of important features is evaluated using four different performance measures. This paper reflects that the complexity of applications is not the only factor in vulnerability discovery and the human-related factors contribute to explaining the number of discovered vulnerabilities in an application.
AB - Existing research on vulnerability discovery models shows that the existence of vulnerabilities inside an application may be linked to certain features, e.g., size or complexity, of that application. However, the applicability of such features to demonstrate the relative security between two applications is not well studied, which may depend on multiple factors in a complex way. In this paper, we perform the first large-scale empirical study of the correlation between various features of applications and the abundance of vulnerabilities. Unlike existing work, which typically focuses on one particular application, resulting in limited successes, we focus on the more realistic issue of assessing the relative security level among different applications. To the best of our knowledge, this is the most comprehensive study of 780 real-world applications involving 6498 vulnerabilities. We apply seven feature selection methods to nine feature subsets selected among 34 collected features, which are then fed into six types of machine learning models, producing 523 estimations. The predictive power of important features is evaluated using four different performance measures. This paper reflects that the complexity of applications is not the only factor in vulnerability discovery and the human-related factors contribute to explaining the number of discovered vulnerabilities in an application.
UR - http://www.scopus.com/inward/record.url?scp=85066497081&partnerID=8YFLogxK
U2 - 10.1109/TIFS.2019.2895963
DO - 10.1109/TIFS.2019.2895963
M3 - Article
SN - 1556-6013
VL - 14
SP - 2315
EP - 2330
JO - IEEE Transactions on Information Forensics and Security
JF - IEEE Transactions on Information Forensics and Security
IS - 9
M1 - 8629314
ER -