The ASR service segments the audio, discards music and noise and then performs the transcription using a multi-pass decoding strategy. The ASR service is described in more detail in section 3. The service uses Google infrastructure to allow scaling to large scale processing by simply expanding the number of machines that run the transcription service. The result of the transcription process consists of a time aligned transcript and confidence scores for each word. This information is stored in a system-local utterance database (block f ) and serves as the basis for the information retrieval index (block g). The index allows search as well navigation within videos in the user interface (block h). These systems are described in more detail in sections 4 and 5 respectively. There were two important requirements when designing the workflow system: reliability of the data storage and availability. We want to minimize the risk of losing data and ensure we are robust against machine outages. To do so, we replicated the workflow in two geographically distant locations meaning there are two identical workflow systems and two utterance databases that stay in sync through a replication mechanism. More precisely, every mutation made to one copy of the database is propagated to the other and vice versa. The replication provides for redundancy in storage providing some safeguard against data loss and robustness against machine outages. In addition, the replication provides for a load balancing mechanism between the two workflow systems when both work- flows are “healthy”. All individual system components are built upon scalable Google infrastructure and as a result the system capacity to handle queries or process videos scales by increasing the number of machines.