Abstract
Quantitative studies on the border between Mining Software Repository (MSR) and Empirical Software Engineering (ESE) apply data analysis methods, like regression modeling, statistic tests or correlation analysis, to commits or pulls to better understand the software development process. Such studies assure the validity of the reported results by following a sound methodology. However, with increasing complexity, parts of the methodology can still go wrong. This may result in MSR/ESE studies with undetected threats to validity. In this paper, we propose to systematically protect against threats by operationalizing their treatment using simulations. A simulation substitutes observed and unobserved data, related to an MSR/ESE scenario, with synthetic data, carefully defined according to plausible assumptions on the scenario. Within a simulation, unobserved data becomes transparent, which is the key difference to a real study, necessary to detect threats to an analysis methodology. Running an analysis methodology on synthetic data may detect basic technical bugs and misinterpretations, but it also improves the trust in the methodology. The contribution of a simulation is to operationalize testing the impact of important assumptions. Assumptions still need to be rated for plausibility. We evaluate simulation-based testing by operationalizing undetected threats in the context of four published MSR/ESE studies. We recommend that future research uses such more systematic treatment of threats, as a contribution against the reproducibility crisis.
| Original language | English |
|---|---|
| Title of host publication | MSR 2022 |
| Subtitle of host publication | Proceedings of the 19th International Conference on Mining Software Repositories |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 86-97 |
| Number of pages | 12 |
| ISBN (Electronic) | 9781450393034 |
| DOIs | |
| Publication status | Published - 2022 |
| Externally published | Yes |
| Event | 2022 Mining Software Repositories Conference, MSR 2022 - Pittsburgh, United States Duration: 23 May 2022 → 24 May 2022 |
Conference
| Conference | 2022 Mining Software Repositories Conference, MSR 2022 |
|---|---|
| Country/Territory | United States |
| City | Pittsburgh |
| Period | 23/05/22 → 24/05/22 |
Fingerprint
Dive into the research topics of 'Operationalizing Threats to MSR Studies by Simulation-Based Testing'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver