3.1 Problem Statement
We have a database S of product descriptions, represented
as structured records. Every structured record s ∈ S consists
of a set of attribute ⟨name, value⟩ pairs. The attributes
can be numeric or categorical. We receive an unstructured
offer u as input, which is a concise free-text description that
specifies values for a subset of the attributes in S in an
arbitrary manner. The text may also contain additional
words. Our objective is to match u to one or more structured
records in S. We use the metric of precision and recall
for judging the quality of the matching system.
We take a probabilistic approach and find the product s ∈
S that has the largest probability of match to the given offer,
u. Our matcher is learned in an offline stage (Algorithm 1).
For this, we postulate a small training set U of unstructured
offers. Each u ∈ U has been matched to one structured
record in S (set M). We also have mismatched records