Our goal in Speech Technology Research is twofold: to make speaking to your phones and computers ubiquitous and seamless, and to help make videos on the web accessible and searchable.
Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.
We are also in a unique position to deliver very user-centric research. Researchers have the wealth of millions of users talking to Voice Search or the Android Voice Input every day, and can conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we keep our users very close to make sure we solve real problems and have real impact.
We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 25 languages and we keep expanding our reach to more and more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.
In terms of a challenge, indexing and transcribing the web’s audio content is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and of course... cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The payoff is immense: imagine making every lecture on the web accessible to every language; this is the kind of impact we are striving for.
Our goal in Speech Technology Research is twofold: to make speaking to your phones and computers ubiquitous and seamless, and to help make videos on the web accessible and searchable.Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.We are also in a unique position to deliver very user-centric research. Researchers have the wealth of millions of users talking to Voice Search or the Android Voice Input every day, and can conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we keep our users very close to make sure we solve real problems and have real impact.We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 25 languages and we keep expanding our reach to more and more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.In terms of a challenge, indexing and transcribing the web’s audio content is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and of course... cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The payoff is immense: imagine making every lecture on the web accessible to every language; this is the kind of impact we are striving for.
การแปล กรุณารอสักครู่..
