It is not surprising nowadays to command your phone, your watch, your speakers, or nearly any other tech device by spewing out a few words and waiting for an accurate response. Likewise, we use chatbots, virtual assistants, and various “smart” tools that understand our voice and our language almost perfectly.
But how did this technology come to be? Is it an overnight invention or a long and gradual evolution?
This tool was developed to synthesize human speech by imitating the effects of the human vocal chords. It was operated by selecting one of the two basic sounds via a pedal bar. Though extremely limited in functionality, it painted the initial strokes for voice recognition.
Capable of understanding a small selection of spoken digits, Audrey could actually distinguish the digits from zero to nine. In essence, Audrey took a big step beyond chores and random sounds and could actually recognise the distinct sound of a spoken digit – zero to nine – with more than 90% accuracy.
The Shoebox could understand up to 16 spoken words in English. This technology was operated by speaking into a microphone, which then converted sounds into electrical impulses. By then, it was clear that voice recognition technology was on the path to understand human language.
A technique called Hidden Markov Model was used, allowing voice recognition machines to more accurately identify speech. Around this time, IBM began work on Tangora, a technology able to identify 20,000 spoken words
The 90s trend for speech recognition at work continued – Apple launched Speakable Items in 1993, a built-in controlled software for their computers. 1993 also saw the introduction of the Sphinx-II, the first large vocabulary continuous speech recognition system.
while Siri was introduced to the world in 2011, giving consumers their very own digital personal assistant. This marked a change for mobile tech companies, as voice recognition enabled users to control their devices more efficiently than ever before.
Ever since, voice recognition technology around the world has been exploding in capabilities as well as applications. With the introduction of Artificial Intelligence, however, things took an even more interesting turn.
AI means that voice recognition is not just limited to a depository of words and sounds that can be understood. AI means that the machine or application itself can learn, train, and continuously become smarter and more powerful.
This is where Xina comes to the Arab world: The World’s first Arabic Interactive Voice Assistant and Chatbot, capitalizing on years of voice-recognition AI development, and filling the void for a Arabic-first business and linguistic support technologies.