II. OVERVIEW OF VIPS ALGORITHM
The VIPS algorithm makes full use of the webpage
layout feature: firstly, it extracts all the suitable blocks based
on the html DOM tree structure, then it tries to find the
separators between these extracted blocks. Here, separators
denote the horizontal or vertical lines in a webpage that
visually cross with no other blocks. Finally, based on these
separators, the semantic structure for the webpage is
constructed and the webpage is divided into some
independent blocks. VIPS algorithm employs a top-down
approach, which is very effective.
The basic model of VIPS is described as below.
A web page W is represented as a triple: