To extract information from the html document generated by crawling the KNIME Forum, we start
from the item, as root, by splitting it in two sub-items: and .
After ungrouping these two item collections, the items are extracted to get the
full list of threads on each page as string documents. This sub-workflow has been encapsulated in the
meta-node named “TOC of Forum”.
The upcoming sequence of XPath nodes, contained in the “Category Page Count” meta-node rescues
all pieces of the same threads spread across different forum pag