This research aims to develop automatic knowledge discovery system from
semi-structured Thai text for supporting plant diagnosis. Plant disease diagnosis
is very important for farmers to be able to cure infected plants before infections
become more severe. Prior to diagnosis, farmers need to gain knowledge
retrieved primarily from text, including unstructured and semi-structured
document. As this knowledge is spread throughout the text, collecting the
required knowledge in its entirety is time consuming. An alternative to the
manual approach is the use of automatic knowledge discovery processes to
acquire concise knowledge for plant disease diagnosis. Then the knowledge
discovery process consists of at least two main steps: knowledge extraction and
knowledge generalization. However, there are two major problems in this
research. First is the knowledge extraction problem attributed to linguistics,
which can be solved by NLP technique such as zero anaphora, ellipsis, etc. And
second is the generalization problem due to obtaining general knowledge that is
intrinsically uncertain and incomplete. To solve these problems we propose three
combination techniques: First, a template-matching rule is used to extract the
knowledge from the agricultural document on website. Second, a Monte Carlo
simulation technique is applied to solve the incomplete knowledge of plant
disease symptoms from the texts. And the third one is the use of the fuzzy concept
to determine the weighted average of the generality of the symptom from each
pathogen type or insect type. The results of knowledge generalization will then
be evaluated by experts, and knowledge extraction will be evaluated in term of
precision, and recall. It is important to note that this is being conducted in part of
ongoing research.