In fact, most of researches show that when a page is
presented to the user, the spatial and visual cues play a very
important role, they help the user to unconsciously divide the
webpage into several semantic parts. So, if we can make use
of this information, it’ll help us to extract the body content of
the page much more precisely. Detecting the semantic
content structure of a webpage could potentially improve the
performance of the webpage content extraction. VIPS [9]
algorithm can do this work perfectly, it can divide the
webpage into some different independent semantic blocks,
and we can also get the coordinate information of each block
to assist the webpage content extraction. Based on VIPS
algorithm, we can recall the lost sentences easily.