TY - JOUR
T1 - Within-project and cross-project software defect prediction based on improved transfer Naive Bayes algorithm
AU - Zhu, Kun
AU - Zhang, Nana
AU - Ying, Shi
AU - Wang, Xu
PY - 2020/5/1
Y1 - 2020/5/1
N2 - With the continuous expansion of software scale, software update and maintenance have become more and more important. However, frequent software code updates will make the software more likely to introduce new defects. So how to predict the defects quickly and accurately on the software change has become an important problem for software developers. Current defect prediction methods often cannot reflect the feature information of the defect comprehensively, and the detection effect is not ideal enough. Therefore, we propose a novel defect prediction model named ITNB (Improved Transfer Naive Bayes) based on improved transfer Naive Bayesian algorithm in this paper, which mainly considers the following two aspects: (1) Considering that the edge data of the test set may affect the similarity calculation and final prediction result, we remove the edge data of the test set when calculating the data similarity between the training set and the test set; (2) Considering that each feature dimension has different effects on defect prediction, we construct the calculation formula of training data weight based on feature dimension weight and data gravity, and then calculate the prior probability and the conditional probability of training data from the weight information, so as to construct the weighted bayesian classifier for software defect prediction. To evaluate the performance of the ITNB model, we use six datasets from large open source projects, namely Bugzilla, Columba, Mozilla, JDT, Platform and PostgreSQL. We compare the ITNB model with the transfer Naive Bayesian (TNB) model. The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary, precision and pd for within-project and cross-project defect prediction.
AB - With the continuous expansion of software scale, software update and maintenance have become more and more important. However, frequent software code updates will make the software more likely to introduce new defects. So how to predict the defects quickly and accurately on the software change has become an important problem for software developers. Current defect prediction methods often cannot reflect the feature information of the defect comprehensively, and the detection effect is not ideal enough. Therefore, we propose a novel defect prediction model named ITNB (Improved Transfer Naive Bayes) based on improved transfer Naive Bayesian algorithm in this paper, which mainly considers the following two aspects: (1) Considering that the edge data of the test set may affect the similarity calculation and final prediction result, we remove the edge data of the test set when calculating the data similarity between the training set and the test set; (2) Considering that each feature dimension has different effects on defect prediction, we construct the calculation formula of training data weight based on feature dimension weight and data gravity, and then calculate the prior probability and the conditional probability of training data from the weight information, so as to construct the weighted bayesian classifier for software defect prediction. To evaluate the performance of the ITNB model, we use six datasets from large open source projects, namely Bugzilla, Columba, Mozilla, JDT, Platform and PostgreSQL. We compare the ITNB model with the transfer Naive Bayesian (TNB) model. The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary, precision and pd for within-project and cross-project defect prediction.
KW - Cross-project defect prediction
KW - Edge data
KW - Feature dimension weight
KW - Similarity calculation
KW - Transfer Naive Bayesian algorithm
UR - http://www.scopus.com/inward/record.url?scp=85090939598&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090939598&partnerID=8YFLogxK
U2 - 10.32604/cmc.2020.08096
DO - 10.32604/cmc.2020.08096
M3 - Article
AN - SCOPUS:85090939598
SN - 1546-2218
VL - 63
SP - 891
EP - 910
JO - Computers, Materials and Continua
JF - Computers, Materials and Continua
IS - 2
ER -