Service technology has been one of the mainstream
technologies in today’s software development since it enables
rapid flexible development and integration of software systems.
The current Web services technology builds software upon basic
building blocks called Web services. They are software units
that provide certain functionalities over the Web and involve a
set of interface and protocol standards, e.g. Web Service
Definition Language (WSDL) for describing service interfaces,
SOAP as a messaging protocol, and Business Process Execution
Language (WS-BPEL) for describing business processes of
collaborating services [1]. Like other software, services may suffer
from communication problems or contain faults themselves, and
hence service consumers may experience service interruption.
Different types of faults have been classified for services [2],
[3], [4], and can be viewed roughly in three categories: (1) Logic
faults comprise calculation faults, data content faults, and other
logic-related faults thrown specifically by the service. Web service
consumers can detect logic faults by WSDL fault messages or
have a way to check correctness of service responses. (2) System
and network faults are those that can be identified, for example,
through HTTP status code and detected by execution
environment, e.g., communication timeout, server error, serviceunavailable. (3) SLA faults are raised when services violate SLAs,
e.g., response time requirements, even though functional
requirements are fulfilled. For service providers, one of the main
goals of service provision is service reliability. Services should
be provided in a reliable execution environment and prepared for
various faults so that failures can be made as transparent as
possible to service consumers. Service designers should therefore
design services with a fault tolerance mindset, expecting the
unexpected and preparing to prevent and handle potential failures.
There are many fault tolerance patterns or exception handling
strategies that can be applied to make software and systems
more reliable. Common patterns involve how to handle or recover
from failures, such as communication retry or the use of redundant
system nodes. In a distributed services context, we end up with a
question of which fault tolerance pattern should be applied to a
particular service. We argue that not all patterns are equally
appropriate for any services. This is due to the characteristics of
each service including service semantics and the environment of
service provision. In this paper, we propose a mathematical model
that can assist service designers in designing fault tolerant
versions of services. The model helps recommend which fault
tolerance patterns are suitable for particular services. With a
supporting tool, service designers can choose a recommended
pattern and have fault tolerant versions of the services generated
as WS-BPEL services.
Section 2 discusses related work in Web services fault
tolerance. Section 3 lists fault tolerance patterns that are
considered in our work. Characteristics of the services and
condition of service provision that we use as criteria for pattern
recommendation are given in Section 4. Section 5 presents
how service designers can be assisted by the pattern
recommendation model. The paper concludes in Section 6 with
future outlook.