Abstract—Software failures are still a major concern in missionand
enterprise-critical contexts, despite significant efforts spent in
software testing. In fact, while software testing is effective against
easily-reproducible bugs (Bohrbugs), it is considerably less suitable
for dealing with bugs that lead to hard-to-reproduce failures (Mandelbugs).
On the positive side, the elusive nature of Mandelbugs
provides opportunities for failure recovery, which are investigated
in this paper. Based on real cases of Mandelbugs in eleven Information
Technology (IT) systems running in production, the paper proposes
a model that describes the recovery processes in IT systems.
It then presents closed-form expressions, and a numerical analysis,
of the mean time to recovery, and the software (un)availability.
This analysis allows the designer to compare recovery strategies,
as well as to determine the parameters having a high influence on
the efficacy of recovery from failures caused by Mandelbugs.