Friday, September 7, 2007


History and Development of Speech Recognition

Abstract

The legend of Speech Recognition has long and honored history. This article discusses some key milestones of that history in both technical and commercial aspects. And also some special researches carried out and their successes and failures are discusses. Further illustrate how and who’s highly contributed to keep this technology in current position.



The original source of the image hosted at [http://www.lumenvox.com/resources/tips/historyOfSpeechRecognition.aspx]


The legend of Speech Recognition Technology has long history. It was started mid-1870s, by Alexander Graham Bell as an attempt to create speech recognition machine that would be able to transform verbal words into text (understandable picture) in a way that even a deaf person could interpret the verbal meanings. But he ended up with another machine call ‘telephone.’ [7]

Engineers, language experts and various other scientists have been attempting to make Bell’s dream a reality with some extensions. It was creating machines those capable in understanding voice of human beings. In 1920 “Radio Rex” has recorded as the earliest succeeded attempt of speech recognition technology history. It was a simple toy-a dog capable of moving when its name was called. In 1930s U.S. Government (first and foremost by the Military and DARPA - Defense Advanced Research Project Agency) and some Universities as well funded on speech recognition researches. It was a real kick off to this area and made many advances.

With this inspiration in 1936, AT&T's Bell Labs started their researches in “Voder” a manually controlled speech synthesizer and was demonstrated in 1939 at World’s Fair.

Then in 1952 a small vocabulary discreet digit recognizer (i.e. the numbers zero up to nine) was demonstrated over the telephone by the same company. Within the same decade found that there is evidence for statistical patterns in speech. It was the times that statistical model for speech recognition was introduced.
[1, 7, 8, 9]

During the next decade early digit recognizers improved. Researchers pay attention on smaller projects and initially attempted to develop desecrate speech recognition system. In this decade recognition systems migrated to acoustic model; rather make use of statistical models. Even in 1964 Martin deployed neural networks for phoneme recognition it seamed natured slowly and later replaced by new ones. Some improvements in vocabulary and accuracy can be identified. These acoustic models consider about the sonic of speech rather than the meaning and supported to implement discreet speech recognition systems well. They are used productively in modern speech recognition products, as well because their ability of working properly in noisy environments and for smaller vocabulary systems. [7]

Again the US military was funded for speech projects in 1970s. There was a project funded them to achieve the expectation of to develop continuous speech recognition system which can be operate on 90% of accuracy. And such systems are still being developed and refined today. In this decade also a new model was upped and introduced by Carnegie-Mellon and Princeton Universities to speech recognition that is "Hidden Markov Model" (HMM) which is widely use in current products as well. [7]

US military funds were continuously flooded towards speech recognition researches during this decade as well. With this helping hand a large Speech Recognition research project was conducted called CMU. Later some excellent students, those who worked in CMU, become members of Microsoft's speech recognition group which is one of most succeeded speech recognition teams. Ending up the starvation of commercial products in 1982 “Covox” released their first speech recognition product and recorded as the first of amongst all other commercial speech recognition products. A new company named Dragon Systems which is a dominant player of speech recognition also established and followed “Covox” and produced a SR system ran on personal computers. It was a simple command and control system. On the other hand ‘Bakers” produced the first, non-commercial, version of speech recognition.

Many further studies were carried out in next decade. In 1995, Dragon Systems introduced a general-purpose speech recognition product allowed users to dictate into their PC. This system expected keep pauses between each spoken word. This product is the earliest of the type (discreet speech recognition systems). The days of this product was released by Dragon, IBM also released a product. Two years later in 1997, Dragon released the first general purpose continuous speech dictation systems, named “Naturally Speaking”. It allows user to speak in conventional way of speaking to their computer as a option to pointing and to, typing. Microsoft also founded on speech recognition and started their own lab recruiting some excellent students from the CMU project, which was won the US military’s praises twice. When the decade was ended and Microsoft entering to the market Dragon systems purchased Lernout & Haupsie as well and still considered as the biggest company in whole over the world related to the speech recognition valued at $460 million at the end of 90s. In these days number of new products scattered over various domains were introduced. Amongst them call counters; the stock marker broker systems which enable customers to quotes on stocks and options over the telephone and many other telephony systems were highlighted. Also here should mention the improvements of the mobile devices mainly developed with the helping hand of SR technology. Mobile phones were manufactured in a way that user could attached voice tags on their contacts and later when needed to dial to that contact just needed to speak that tagged word called (voice dialing). Command and Control was also introduced to mobile phones enabling functionality such as turn to the phone state to silent mood just commanding to the phone using voice.


Reference:

[1] Speech Recognition Software and Medical Transcription History, A Timeline of Speech Recognition, http://www.dragon-medical-transcription.com/historyspeechrecognitiontimeline.html, 11 july 2007

[4] Histry of Speech Recognition, http://www.lumenvox.com/resources/tips/historyOfSpeechRecognition.aspx, 11 july 2007

[7] Cell Phones: History of Speech Recognition, http://www.thecellphoneforum.com/content.php.article.62, 14 July 2007

[8] Milestones in Speech Technology – Past and Future!, http://www.speechtechmag.com/Articles/ReadArticle.aspx?ArticleID=29572, 14 July 2007

[9] Speech Recognition, Daniel Thalman, http://209.85.129.104/search?q=cache:04GGydLYegYJ:vrlab.epfl.ch/~thalmann/VR/VRcourse_Speechrec.pdf+History+of+speech+recognition&hl=en&ct=clnk&cd=32, 14 July 2007

[10] Speech Recognition Research Paper.

Sunday, September 2, 2007

Speech Recognition

Speech Recognition Technology is not a strange technology and it has a long and venerable history. The idea of making machines will capable in responding for human commands, was exits from many time ago because it seams a real option for user input. It carries the potential of reducing the existence distance between machines and human beans and currently has achieved a high level of accuracy (Over 95%) and performance. Also the industry and research of speech recognition is very active, evolving and changing rapidly resulting that a better place for not only regular computer users but also for physically handicapped individuals.

Another ability of Speech recognition is transforming spoken words into text. The impotence of this is though Speech is easier to generate, more conventional and fast in generation. On the other hand writing text is slow and a hard process like in a case of interviewing. But most of the times people like to and required to have text versions of these speeches. It may cause of listening to speech is slower, harder to memorize, harder to navigate and harder revise and many other practical issues. Converting of speech to text can makes a better solution to these issues.

Even though the speech recognition idea and some technologies were exists for years the progress of the technology was staged due to lack of hardware availability in past. But as a result of development of computer processing power and their availability at cheap prices also with the helping hands of many enthusiastic researchers, sponsors and many other stakeholders the speech recognition technology has been developed and still is developing rapidly. Today, there are diverse of applications scattered over number of domains based on speech recognition such as Dictation systems, as products developed for general computer interaction (Voice enable interfaces), IVR systems, Voice enable web sites, Language translators and many more.

Actually though these application are often considered as simply speech recognition systems most of them are hybrid systems of both Speech-Recognition and Voice-Recognition where Speech-Recognition is simply a process that identify spoken words and Voice-Recognition is a process in which identifying the producer of spoken words.

Modern speech recognition systems are developed to one of two main categories of customer requirement\specifications. All those systems have been ended upped as either a speaker-dependent continuous-speech PC-based system or a speaker-independent continuous-speech server-based system.

Speaker independent systems are generally expensive than individual speaker-dependent system and don’t expect previous training for particular user. Even such systems are expensive when deal with huge and various user bases it is require to have speech independence and performance rather when deal with one or two single subscribers.

There is no doubt of these statements and facts. All of them are true for English language speech recognition but not sure for non English languages. Even though most of these systems has developed for English language, researches has realized that the requirement of having those for non English languages as well. Number of projects and researches has been accomplished for non English languages as well but this numbers are very smaller when it compares with projects has been conducted for English language speech recognition.

References:

[1] White Paper On Speech Recognition In The SESA Call Center, By Ron Mains, Tim Meier, Scott Nainis, Henry M. James,http://www.itsc.state.md.us/PDF/O-2-2%20Technology%20Assessment%20Final%20Report.pdf, April 2001.

[4] Histry of Speech Recognition, http://www.lumenvox.com/resources/tips/historyOfSpeechRecognition.aspx, 11 july 2007