← Blog
Artificial Intelligence

The Evolution of Voice Recognition

Mohannad Aljawamis
March 28, 2021
No items found.
How did voice recognition technology start? How is it possible to use artificial intelligence to power our smartphones, watches, and virtual assistants?

The Evolution of Voice Recognition

It is not surprising nowadays to command your phone, your watch, your speakers, or nearly any other tech device by spewing out a few words and waiting for an accurate response. Likewise, we use chatbots, virtual assistants, and various “smart” tools that understand our voice and our language almost perfectly. 

But how did this technology come to be? Is it an overnight invention or a long and gradual evolution?  

Milestone 1 - In 1939, The Voder was demonstrated at the World Fair in New York City

This tool was developed to synthesize human speech by imitating the effects of the human vocal chords. It was operated by selecting one of the two basic sounds via a pedal bar. Though extremely limited in functionality, it painted the initial strokes for voice recognition.


Milestone 2 - 1952 saw the birth of Audrey by Bell Labs

Capable of understanding a small selection of spoken digits, Audrey could actually distinguish the digits from zero to nine. In essence, Audrey took a big step beyond chores and random sounds and could actually recognise the distinct sound of a spoken digit – zero to nine – with more than 90% accuracy.

Milestone 3 - IBM demonstrated the Shoebox at the 1962 Seattle World Fair

The Shoebox could understand up to 16 spoken words in English. This technology was operated by speaking into a microphone, which then converted sounds into electrical impulses. By then, it was clear that voice recognition technology was on the path to understand human language. 

Milestone 4 - In 1976, after five years of research by DARPA

The Harpy was developed by Carnegie Mellon, this technology was able to understand 1,011 words, marking another accomplishment in the journey of expanding voice recognition capabilities.  

Milestone 5 - By the early 1980s, voice recognition began making great leaps towards greater viability

A technique called Hidden Markov Model was used, allowing voice recognition machines to more accurately identify speech. Around this time, IBM began work on Tangora, a technology able to identify 20,000 spoken words

Milestone 6 - By 1990, speech recognition had reached the workplace with Dragon Dictate, via Windows PCs.

The 90s trend for speech recognition at work continued – Apple launched Speakable Items in 1993, a built-in controlled software for their computers. 1993 also saw the introduction of the Sphinx-II, the first large vocabulary continuous speech recognition system.

Milestone 7 - In 2008, Google launched the Voice Search App for the iPhone

while Siri was introduced to the world in 2011, giving consumers their very own digital personal assistant. This marked a change for mobile tech companies, as voice recognition enabled users to control their devices more efficiently than ever before.


Ever since, voice recognition technology around the world has been exploding in capabilities as well as applications. With the introduction of Artificial Intelligence, however, things took an even more interesting turn. 

AI means that voice recognition is not just limited to a depository of words and sounds that can be understood. AI means that the machine or application itself can learn, train, and continuously become smarter and more powerful. 

This is where Xina comes to the Arab world: The World’s first Arabic Interactive Voice Assistant and Chatbot, capitalizing on years of voice-recognition AI development, and filling the void for a Arabic-first business and linguistic support technologies. 

Find out more about Xina here