According to Dey et al. [10], a good web crawler should contain the following features: a high performance system
architecture that can retrieve a large number of web pages at the same time, capability of dealing with memory stack
overflow resulted from large web page contents, decision on which page is next to be downloaded (ranking algorithms),
and strong system with existing resources and web servers gainst crashes.
The web crawler used within our mobilizer does notfocus on web page links. Instead, it focuses on web page content, such as images and layouts. Since the web page mobilizer is designed to be used for a single client organization, the URLs accessed will typically fall under a single domain, which means that the web crawler needs to only crawl within the same domain. Our observations showed that a typical company’s web site usually contains at most a three-level web page hierarchy. Given this we heuristically limit our crawler to three levels within the web page hierarchy (our tests also showed that web crawling beyond three levels results in the retrieval of too many duplicates with too few new links).