I. Introduction
The advent and subsequent popularity of Service Oriented
Architecture (SOA) has prompted the development
of efficient system integration techniques for heterogeneous
systems over the Web platform in the form of
Web services. Web service technology enables exposure
of an application’s functionalities in the form of interoperable
and reusable service instances. The widespread
adoption of web services and their use in integration
of multi-platform, cross-vendor business applications can
be attributed to the concentrated efforts of the W3C
and several industry giants like Microsoft, IBM and SAP
towards the development of Web Services standards. A
web service is described by means of the Web Services
Description Language (WSDL)1 and is invoked through
the Simple Object Access Protocol (SOAP)2 messages. A
third component of the Web service scenario was a public
1http://www.w3.org/TR/wsdl/
2http://www.w3.org/TR/soap/
UDDI3
(Universal Description, Discovery and Integration)
registry that allows service providers to register their
service advertisements in order to facilitate categorization
and third party discovery. However, due to the lack of
popularity of this initiative, the concept of the Universal
Business Registry (UBR) was discontinued in 2005.
Further, due to the relatively large search space that
needs to be processed while serving each query, it is also
effective to employ certain data mining techniques like
clustering to service descriptions. Clustering can enhance
the service discovery process by reducing the search space
by enabling domain specific search to serve a particular
user query. Automatic tagging of service descriptions facilitates
categorization, management and improved performance
of web service search engines. Clustering and
tagging of web services could play a vital role in filtering
irrelevant services, thus improving further tasks like service
composition which are otherwise time consuming.
In this paper, we present a system for facilitating
efficient web service discovery based on conventional IR
methods and NLP techniques. The focus of the proposed
work is to first develop a technique to find and retrieve
published service descriptions from the Web to build a
scalable service repository. The challenges in this phase
are identifying only valid service descriptions and to avoid
duplicates due to repeated crawler runs. We used NLP
techniques to analyze and tag service descriptions, and
then cluster services based on their functionalities expressed
in the form of features extracted from WSDL
documents.
The paper is organized as follows. Section II discusses
existing work in the area of web service discovery,
specifically in clustering for service discovery. Section III
discusses the proposed system and its components and in
section IV we present the details of the process of clustering
available services. Section V presents a discussion
on tagging of services and clusters using NLP techniques,
and results obtained in section VI followed by conclusion
& future work.