5. Conclusion
Compared with the other strategies proposed, application of
MeSH together with a strategy based on vector methods for classifying
texts based on their content (InDeCS) presented better accuracy
for classifying web pages within or outside of the field of
healthcare, focusing towards the lay public (0.94 sensitivity, specificity
and AUC). Furthermore, the use of dataset bootstrapping
showed that InDeCS also made it possible to deal with non-deterministic
aspects of the classification problem that was posed.
Because of the good results presented by InDeCS in relation to
classifying web pages for the field of healthcare, focusing on the
lay public, this classifier has been used to improve the quality of
the results presented by the Brazilian search portal Busca Saúde.
This portal has the aim of providing support for the lay public in
relation to retrieval of health-related information that is available
on the Internet.
The results presented and discussed in this paper are important
particularly because they make it possible to apply MeSH in studies
on text classification and indexation when the main focus is on
the lay public. In addition, this study also raises the possibility that
other studies could investigate the use of controlled vocabularies
as the basis for better mapping out of mutable non-deterministic
characteristics of the Internet.