detailed aspects, such as tag tree method [11], ontology [12]
method and so on, it still has some problems, one main
problem is that this method often throws some sentences of
the main body content away. Because it’s based on the local
judgment of the DOM tree, it can’t get the whole view of the
page. On the other hand, the DOM tree is initially introduced
for presentation in the browser rather than description of the
semantic structure of the webpage, so you can’t get the
semantic relation of the different sentence directly, it’s no
wonder that this method sometimes loses some part of the
content.