Large-scale empirical study of important features indicative of discovered vulnerabilities to assess application security

Mengyuan Zhang, Xavier De Carne De Carnavalet, Lingyu Wang, Ahmed Ragab

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Existing research on vulnerability discovery models shows that the existence of vulnerabilities inside an application may be linked to certain features, e.g., size or complexity, of that application. However, the applicability of such features to demonstrate the relative security between two applications is not well studied, which may depend on multiple factors in a complex way. In this paper, we perform the first large-scale empirical study of the correlation between various features of applications and the abundance of vulnerabilities. Unlike existing work, which typically focuses on one particular application, resulting in limited successes, we focus on the more realistic issue of assessing the relative security level among different applications. To the best of our knowledge, this is the most comprehensive study of 780 real-world applications involving 6498 vulnerabilities. We apply seven feature selection methods to nine feature subsets selected among 34 collected features, which are then fed into six types of machine learning models, producing 523 estimations. The predictive power of important features is evaluated using four different performance measures. This paper reflects that the complexity of applications is not the only factor in vulnerability discovery and the human-related factors contribute to explaining the number of discovered vulnerabilities in an application.
Original languageEnglish
Article number8629314
Pages (from-to)2315-2330
JournalIEEE Transactions on Information Forensics and Security
Volume14
Issue number9
DOIs
Publication statusPublished - 1 Sept 2019
Externally publishedYes

Funding

Manuscript received February 20, 2018; revised July 24, 2018, October 31, 2018, and January 7, 2019; accepted January 9, 2019. Date of publication January 29, 2019; date of current version May 24, 2019. The work of M. Zhang and L. Wang was supported in part by the Natural Sciences and Engineering Research Council of Canada under Discovery Grant N01035. The work of X. de Carné de Carnavalet was supported in part by the Vanier Canada Graduate Scholarship. Our dataset including the extracted features for the 780 GitHub repositories is available upon request to the first author. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Issa Traore. (Corresponding author: Mengyuan Zhang.) M. Zhang was with the Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC H3G 1M8, Canada. She is now with Ericsson Research, Montreal, QC H4S 0B6, Canada (e-mail: [email protected]).

FundersFunder number
Vanier Canada Graduate Scholarship
Natural Sciences and Engineering Research Council of CanadaN01035

    Fingerprint

    Dive into the research topics of 'Large-scale empirical study of important features indicative of discovered vulnerabilities to assess application security'. Together they form a unique fingerprint.

    Cite this