Bankruptcy prediction has been a subject of interests for almost a century and it still ranks high among hottest topics in economics. The aim of predicting financial distress is to develop a predictive model that combines various econometric measures and allows to foresee a financial condition of a firm. In this domain various methods were proposed that were based on statistical hypothesis testing, statistical modeling (e.g., generalized linear models), and recently artificial intelligence (e.g., neural networks, Support Vector Machines, decision tress). In this paper, we propose a novel approach for bankruptcy prediction that utilizes Extreme Gradient Boosting for learning an ensemble of decision trees. Additionally, in order to reflect higher-order statistics in data and impose a prior knowledge about data representation, we introduce a new concept that we refer as to synthetic features. A synthetic feature is a combination of the econometric measures using arithmetic operations (addition, subtraction, multiplication, division). Each synthetic feature can be seen as a single regression model that is developed in an evolutionary manner. We evaluate our solution using the collected data about Polish companies in five tasks corresponding to the bankruptcy prediction in the 1st, 2nd, 3rd, 4th, and 5th year. We compare our approach with the reference methods.
- Bankruptcy prediction
- Extreme gradient boosting
- Imbalanced data
- Synthetic features generation