III. THE PROBLEM
During the preliminary investigation of Thai Herb websites,
we found the following problems for the task of extracting Thai
herb treatment information.
The first problem is that the available HTML parser
tool, such as JSOUP, can only extract the Thai herb treatment
information from some websites, but not all websites. The
websites that it can process must have explicit structure for
different types of information. However, some websites do not
have this explicit structure, as illustrate in figure 1 and figure 2.
When the topic name can be common name, science
name, family name, and medicinal used.
Hence, a HTML parser tool can extract this type of
website rather reliably.
In contrast, Figure 2 shows a HTML source with
inconsistent structure. For example, it embeds different type of
information in the same tag type, i.e., . Hence, the HTML
parser cannot reliably extract needed information.
The second problem is how to extract parts of used
and symptom names, which embedded inside the medicinal
used, as shown in figure 3.
III. THE PROBLEMDuring the preliminary investigation of Thai Herb websites,we found the following problems for the task of extracting Thaiherb treatment information.The first problem is that the available HTML parsertool, such as JSOUP, can only extract the Thai herb treatmentinformation from some websites, but not all websites. Thewebsites that it can process must have explicit structure fordifferent types of information. However, some websites do nothave this explicit structure, as illustrate in figure 1 and figure 2.When the topic name can be common name, sciencename, family name, and medicinal used.Hence, a HTML parser tool can extract this type ofwebsite rather reliably.In contrast, Figure 2 shows a HTML source withinconsistent structure. For example, it embeds different type ofinformation in the same tag type, i.e., . Hence, the HTMLparser cannot reliably extract needed information.The second problem is how to extract parts of usedand symptom names, which embedded inside the medicinalused, as shown in figure 3.
การแปล กรุณารอสักครู่..
