Abstract
In the last years, the application of supervised Machine Learning (ML) algorithms in Requirements Engineering (RE) has allowed increasing the performance (e.g. accuracy, precision) and scalability of automatic requirements classification. However, the lack of publicly labeled datasets is still one concern when conducting ML experiments. Few publicly labeled datasets for non-functional requirements classification are available, and even less in the Spanish language. Moreover, most of the available datasets present some limitations, such as imbalanced classes (e.g. PROMISE NFR). This study aims to generate a FAIR dataset of non-functional requirements in the Spanish language for facilitating reuse in ML classification experiments. 109 non-functional requirements were collected from final degree projects from the University of A Coruña. We conducted a pilot quasi-experiment for non-functional requirements labeling in the categories and subcategories of the ISO/IEC 25010 quality model. The labeling process was accomplished by 7 annotators. The inter-annotator agreement using a Fleiss' Kappa test obtained a substantial agreement in the category level (0.78) and a moderate agreement (0.48) when the classification is per subcategory.
Original language | English |
---|---|
Title of host publication | SAC '23 |
Subtitle of host publication | Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing |
Editors | Jiman Hong |
Publisher | Association for Computing Machinery |
Pages | 1414-1421 |
Number of pages | 8 |
ISBN (Electronic) | 9781450395175 |
DOIs | |
Publication status | Published - 2023 |
Event | 38th Annual ACM Symposium on Applied Computing, SAC 2023 - Tallinn, Estonia Duration: 27 Mar 2023 → 31 Mar 2023 |
Publication series
Name | Proceedings of the ACM Symposium on Applied Computing |
---|
Conference
Conference | 38th Annual ACM Symposium on Applied Computing, SAC 2023 |
---|---|
Country/Territory | Estonia |
City | Tallinn |
Period | 27/03/23 → 31/03/23 |
Bibliographical note
Funding Information:This research was partially funded by Xunta de Galicia/FEDER-UE ED413C 2021/53 (Database Lab, UDC) and ED431G 2019/04 (CITIUS, USC). The authors also want to acknowledge all researchers and practitioners that participated as annotators in our study.
Publisher Copyright:
© 2023 ACM.
Keywords
- data labeling
- FAIR principles
- non-functional requirements
- spanish dataset