Design of HTML parallel parser with semantic-based input splitting
Abstract:
HTML is a widely used markup language to make up innumerable web pages. Parallelization of a HTML parser would lead to consequential performance improvement and a better user experience. However, parallelizing the HTML parser is challenging because of a strong cyclic dependence in the parser model. In this paper, we propose a semantic-based HTML parallel parser design that splits the input HTML document by a `div' tag, and processes the independent partial inputs with multiple parser threads. We evaluated the proposed HTML parallel parser with the benchmarks selected from top 500 web pages and achieved a maximum speedup of 1.49x.