—Smart TVs have realized the convergence of
TV, Internet, and PC technologies, but still do not provide
a seamless content interaction for TV-enabled shopping. To
purchase interesting items displayed in a TV show, consumers
must resort to a store or the Web, which is an inconvenient way of
purchasing products. The fundamental challenge in realizing such
a use case consists of understanding the multimedia content being
streamed. Such a challenge can be realized by utilizing object
detection to facilitate content understanding though it has to be
executed as a computationally bound process so that consumers
are provided with a responsive and exciting user interface. To this
end, we propose a computational- and temporal-aware multimedia
abstraction framework that facilitates the efficient execution of
object detection tasks. Given computational and temporal rate
constraints, the proposed framework selects the optimal video
frames that best represent the video content and allows the
execution of the object detection task as a computationally bound
process. In this sense, the framework is computationally scalable
as it can adapt to the given constraints and generate optimal
abstraction results accordingly. Additionally, the framework
utilizes “object views” as the basis for the frame selection process,
which depict salient information and are represented as regions
of interest (ROI). In general, an ROI can be a whole frame or
a region that discards background information. Experimental
results demonstrate the computational scalability of the proposed
framework and the benefits of using the regions of interest as the
basis of the abstraction process.