Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction

MacIej Ziȩba*, Sebastian K. Tomczak, Jakub M. Tomczak

*Corresponding author for this work

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Bankruptcy prediction has been a subject of interests for almost a century and it still ranks high among hottest topics in economics. The aim of predicting financial distress is to develop a predictive model that combines various econometric measures and allows to foresee a financial condition of a firm. In this domain various methods were proposed that were based on statistical hypothesis testing, statistical modeling (e.g., generalized linear models), and recently artificial intelligence (e.g., neural networks, Support Vector Machines, decision tress). In this paper, we propose a novel approach for bankruptcy prediction that utilizes Extreme Gradient Boosting for learning an ensemble of decision trees. Additionally, in order to reflect higher-order statistics in data and impose a prior knowledge about data representation, we introduce a new concept that we refer as to synthetic features. A synthetic feature is a combination of the econometric measures using arithmetic operations (addition, subtraction, multiplication, division). Each synthetic feature can be seen as a single regression model that is developed in an evolutionary manner. We evaluate our solution using the collected data about Polish companies in five tasks corresponding to the bankruptcy prediction in the 1st, 2nd, 3rd, 4th, and 5th year. We compare our approach with the reference methods.

Original languageEnglish
Pages (from-to)93-101
Number of pages9
JournalExpert Systems with Applications
Volume58
DOIs
Publication statusPublished - 1 Oct 2016
Externally publishedYes

Funding

The research conducted by the authors has been partially co-financed by the Ministry of Science and Higher Education, Republic of Poland, namely, Maciej Zięba: grant No. B50083/W8/K3.

Keywords

  • Bankruptcy prediction
  • Extreme gradient boosting
  • Imbalanced data
  • Synthetic features generation

Fingerprint

Dive into the research topics of 'Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction'. Together they form a unique fingerprint.

Cite this