To extract segment-level keywords, we first arrange each
ASR and OCR word to an appropriate video segment
according to the time stamp. Then we extract nouns from
the transcripts by using the stanford part-of-speech tagger
[25] and a stemming algorithm is subsequently utilized to
capture nouns with variant forms. To remove the spelling
mistakes resulted by the OCR engine, we perform a dictionary-
based filtering process.
We calculate the weighting factor for each remaining
keyword by extending the standard TFIDF score [26]. In
general, the TFIDF algorithm calculates keywords only
according to their statistical frequencies. It cannot represent
the location information of keywords, that might be important
for ranking keywords extracted from web pages or lecture
slides. Therefore, we defined a new formula for
calculating TFIDF score, as shown by Eq. (1):