Annotations
As meetings can be structured in layers and we wish to label or annotate chunks of data in accordance with these layers, there is a need for an annotation lan- guage that supports these structures. An annotation format can be seen as an instantiation of a model. A model describes how the annotation should look like, which annotation structures are possible and what these structures mean. This implies, however, that if the model changes, the annotations are influenced as well and vice versa.
The choice of annotation schemas and structures for the separate boxes should in most applications be inspired by explanatory models of humans inter- action and the application goals. Different models or different uses of the models may lead to distinct annotation schemas for the information in the boxes.
5.1 Manual Annotations
The y pairs on meeting recordings [28]. Simple manual transcription of speech usually takes 10xRT. For more complicated speech transcription such as prosody 100-200xRT has been reported in Syrdal et al. [29]. The cost of syntactic annota- tion of text (PoS tagging and annotating syntactic structure and labels for nodes and edges) may run to an average of 50 seconds per sentence with an average sentence length of 17.5 tokens (cf. Brants et al. [30], which describes syntactic annotation of a German newspaper corpus). As a final example, Lin et al. [31] report an annotation efficiency of 6.8xRT for annotating MPEG-7 metadata on video using the VideoAnnEx tool. The annotation described there consists of correction of shot boundaries, selecting salient regions in shots and assigning semantic labels from a controlled lexicon. It may be obvious that more complex annotation of video will further increase the cost.
The type of research for which the framework described in this paper is developed requires not one or two annotation types on the data but a rich set of different annotations. It is therefore an important task to cut down the time