Introduction: Internet users are increasingly using the worldwide web to search for information relating
to their health. This situation makes it necessary to create specialized tools capable of supporting users in
their searches.
Objective: To apply and compare strategies that were developed to investigate the use of the Portuguese
version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-
language web-based content within or outside of the field of healthcare, focusing on the lay public.
Methods: 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were
used to validate it. The strategies proposed were constructed using content-based vector methods for text
classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics
obtained through the proposed strategies.
Results: A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put
forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity
and area under the ROC curve).
Conclusions: Because of the significant results achieved by InDeCS, this tool has been successfully applied
to the Brazilian healthcare search portal known as Busca Saúde. Furthermore, it could be shown that
MeSH presents important results when used for the task of classifying web-based content focusing on
the lay public. It was also possible to show from this study that MeSH was able to map out mutable
non-deterministic characteristics of the web.