IV. ENHANCEMENT OF MOBILE VIDEO USER EXPERIENCE The enhancement of mobile video UX in our setup consists of three parts. First, thumbnail cue frames are generated at the transitioning scenes (i.e. events in the video). Figure 2 shows the concept of video browsing in more details. The thumbnail seek bar is placed on the top part. It consists of thumbnails of all the scenes of the video ordered by occurrence. This makes it easy to browse videos. The user can orientate himself by the thumbnails and does not need to wait until the video is loaded at a certain
Fig. 1. MVCS workflow in a mobile video application
time point. This works well regarding the low bandwidth. As described before, the thumbnails have such a small resolution that they are loaded very fast. Furthermore, a lazy list has been implemented so that it requires even less bandwidth as only currently viewable images are loaded. Clicking on a thumbnail redirects the user directly to the corresponding scene in the video. The user can now search content much faster than in a traditional video player. This again improves the orientation for the user. If the user clicks on an image he is directly redirected to the corresponding time point in the video. Furthermore, the seek bar focuses the current scene and scrolls automatically.
Fig. 2. Video stream browsing based on video segmentation and automatically generated metadata
Second, the tag list (right) consists of tags which have been added manually by the user himself, by other users, or generated automatically. Like the thumbnails the tags are ordered by timely occurrence in the video. If a user clicks on a tag, the video stream goes directly to the corresponding position. Both components, i.e. the segment-based seek bar and the tag list are implemented to overcome the mobile UX problem of video browsing on a mobile device. Finally, the third part of mobile UX improvement contains the video player itself. As device information including screen size is sent to the cloud the necessary zooming ratio can be calculated. Depending on the screen size of the device
a zooming ratio is calculated and depending of the objects a center position of the zooming field is defined. The red rectangle symbolizes the zoomed video. Two persons are recognized by the object recognition service and therefore the zooming field is set at this position. The user just sees this part of the video and can better concentrate on the persons. For future implementations more enhanced classifiers or feature descriptors can be used (e.g. football, etc.) to enable personalized zooming.