Thursday, January 23, 2020

Voice and speech recognition : Bill Gates prediction was off by almost a decade

Voice and speech recognition : Bill Gates prediction was off by almost a decade

“In this 10-year time frame, I believe that we’ll not only be using the keyboard and the mouse to interact but during that time we will have perfected speech recognition and speech output well enough that those will become a standard part of the interface.” — Bill Gates, 1 October 1997

I have been using voice and speech recognition often since 1998 starting with IBM via voice and the origin were Dragon software.
Unfortunately IBM viavoice stopped on improving this product in 2005 thinking that they have achieved perfection!
The regular Dragon was acquired by many companies one after the other with Nuance being the latest one. None of the original developers of this speech recognition are still working on this product as far as I know. Nuance is killing which recognition by pricing product at exorbitant rates. I am sure Google voice input is going to take them in the balls and this Nuance company is going to go the Kodak way.

One of my pet peeves about Dragon Naturally-speaking is that it is not Unicode compatible I have been trying to bootstrap this software to recognize a regional Indian language for Telugu. As this software does not recognize Unicode I was forced to convert all my words into a transliteration.
After about 10 years of work I gave up.
Recently in 2019 I was greatly surprised and also happy Google Google docs has this SpeechInput in chrome which works great.
Another problem I face is fully developed websites which do not accept SpeechInput
even the website which is designed for improving accessibility in W3 is not quoted properly for accepting SpeechInput.


  • Speech recognition is accessed via the SpeechRecognition interface, which provides the ability to recognize voice context from an audio input (normally via the device's default speech recognition service) and respond appropriately. Generally you'll use the interface's constructor to create a new SpeechRecognition object, which has a number of event handlers available for detecting when speech is input through the device's microphone. The SpeechGrammar interface represents a container for a particular set of grammar that your app should recognise. Grammar is defined using JSpeech Grammar Format (JSGF.)
  • Speech synthesis is accessed via the SpeechSynthesis interface, a text-to-speech component that allows programs to read out their text content (normally via the device's default speech synthesiser.) Different voice types are represented by SpeechSynthesisVoice objects, and different parts of text that you want to be spoken are represented by SpeechSynthesisUtterance objects. You can get these spoken by passing them to the SpeechSynthesis.speak() method.

No comments: