Fault injection campaigns have been used extensively to characterize the behavior of systems under errors. Traditional characterization studies, however, focus only on analyzing fail-stop behavior, incorrect test results, and other obvious failures observed during the experiment. More research is needed to evaluate the impact of silent failures—a relevant and insidious class of real-world failures—and doing so in a fully automated way in a fault injection setting. This paper presents a new methodology to identify fault injection-induced silent failures and assess their impact in a fully automated way. Drawing inspiration from system call-based anomaly detection, we compare faulty and fault-free execution runs and pinpoint behavioral differences that result in externally visible changes—not reported to the user—to detect silent failures. Our investigation across several different programs demonstrates that the impact of silent failures is relevant, consistent with field data, and should be carefully considered to avoid compromising the soundness of fault injection results.
|Title of host publication||Proceedings of the 10th European Dependable Computing Conference (EDCC 2014)|
|Publication status||Published - 2014|
|Event||10th European Dependable Computing Conference (EDCC 2014) - |
Duration: 13 May 2014 → 16 May 2014
|Conference||10th European Dependable Computing Conference (EDCC 2014)|
|Period||13/05/14 → 16/05/14|