Terminology use, as a mean for information retrieval or document indexing, plays an important role in health literacy.
Specific types of users, i.e. patients with diabetes need access to various online resources (on foreign and/or native
language) searching for information on self-education of basic diabetic knowledge, on self-care activities regarding importance
of dietetic food, medications, physical exercises and on self-management of insulin pumps. Automatic extraction
of corpus-based terminology from online texts, manuals or professional papers, can help in building terminology
lists or list of »browsing phrases« useful in information retrieval or in document indexing. Specific terminology lists represent
an intermediate step between free text search and controlled vocabulary, between user’s demands and existing online
resources in native and foreign language. The research aiming to detect the role of terminology in online resources,
is conducted on English and Croatian manuals and Croatian online texts, and divided into three interrelated parts: i)
comparison of professional and popular terminology use ii) evaluation of automatic statistically-based terminology extraction
on English and Croatian texts iii) comparison and evaluation of extracted terminology performed on English
manual using statistical and hybrid approaches. Extracted terminology candidates are evaluated by comparison with
three types of reference lists: list created by professional medical person, list of highly professional vocabulary contained
in MeSH and list created by non-medical persons, made as intersection of 15 lists. Results report on use of popular and
professional terminology in online diabetes resources, on evaluation of automatically extracted terminology candidates
in English and Croatian texts and on comparison of statistical and hybrid extraction methods in English text. Evaluation
of automatic and semi-automatic terminology extraction methods is performed by recall, precision and f-measure