The system created in [1, 2] does not leverage any existing semantic markup within the HTML but instead relies on users to manually define and label each individual data attribute. Also, extracted elements are limited to a small set of predefined labels within a single schema.