← Blog
Artificial Intelligence

From One Word to Full Human-like Speech: The Evolution of Voice Recognition

Mohannad Aljawamis
March 28, 2021

It is not surprising nowadays to command your phone, your watch, your speakers, or nearly any other tech device by spewing out a few words and waiting for an accurate response. Likewise, we use chatbots, virtual assistants, and various “smart” tools that understand our voice and our language almost perfectly. 


But how did this technology come to be? Is it an overnight invention or a long and gradual evolution?  

Voder Keyboard First Human Voice Imitator


Milestone 1 - In 1939, The Voder was demonstrated at the World Fair in New York City. This tool was developed to synthesize human speech by imitating the effects of the human vocal chords. It was operated by selecting one of the two basic sounds via a pedal bar. Though extremely limited in functionality, it painted the initial strokes for voice recognition. 

Audrey Bell Labs Spoken digits voice recognition


Milestone 2 - 1952 saw the birth of Audrey by Bell Labs. Capable of understanding a small selection of spoken digits, Audrey could actually distinguish the digits from zero to nine. In essence, Audrey took a big step beyond chores and random sounds and could actually recognise the distinct sound of a spoken digit – zero to nine – with more than 90% accuracy.


IBM Shoebox English voice recognition

Milestone 3 - IBM demonstrated the Shoebox at the 1962 Seattle World Fair. The Shoebox could understand up to 16 spoken words in English. This technology was operated by speaking into a microphone, which then converted sounds into electrical impulses. By then, it was clear that voice recognition technology was on the path to understand human language. 


Drapa Voice Recognition Carnegie Melon

Milestone 4 - In 1976, after five years of research by DARPA, The Harpy was developed by Carnegie Mellon, this technology was able to understand 1,011 words, marking another accomplishment in the journey of expanding voice recognition capabilities.  



Hidden Markov Model Tangora

Milestone 5 - By the early 1980s, voice recognition began making great leaps towards greater viability. A technique called Hidden Markov Model was used, allowing voice recognition machines to more accurately identify speech. Around this time, IBM began work on Tangora, a technology able to identify 20,000 spoken words


Milestone 6 - By 1990, speech recognition had reached the workplace with Dragon Dictate, via Windows PCs. The 90s trend for speech recognition at work continued – Apple launched Speakable Items in 1993, a built-in controlled software for their computers. 1993 also saw the introduction of the Sphinx-II, the first large vocabulary continuous speech recognition system.


Siri massive voice recognition for smartphones

Milestone 7 - In 2008, Google launched the Voice Search App for the iPhone, while Siri was introduced to the world in 2011, giving consumers their very own digital personal assistant. This marked a change for mobile tech companies, as voice recognition enabled users to control their devices more efficiently than ever before.

 

Ever since, voice recognition technology around the world has been exploding in capabilities as well as applications. With the introduction of Artificial Intelligence, however, things took an even more interesting turn. 


AI means that voice recognition is not just limited to a depository of words and sounds that can be understood. AI means that the machine or application itself can learn, train, and continuously become smarter and more powerful. 


Xina Arabic voice recognition virtual assistant

This is where Xina comes to the Arab world: The World’s first Arabic Interactive Voice Assistant and Chatbot, capitalizing on years of voice-recognition AI development, and filling the void for a Arabic-first business and linguistic support technologies. 


Find out more about Xina here