Design and Implementation of Text-to-Speech (Audio) System
Design and Implementation of a Text-to-speech/audio system is the generation of synthesized speech from text. The purpose of this paper is to make synthesized speech as intelligible, natural and pleasant to listen, as human speech. Speech is the primary means of communication between people.
During synthesis very small segments of recorded human speech are concatenated together to produce the synthesized speech.
The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood. A text-to-speech synthesizer allows people with visual impairments and reading disabilities to listen to written works on a home computer.
Language is the ability to express one’s thoughts by means of a set of signs (text), gestures, and sounds. It is a distinctive feature of human beings, who are the only creatures to use such a system. Speech is the oldest means of communication between people and it is also the most widely used. ‘Speech synthesis’ also called ‘Text to speech synthesis’ is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer and can be implemented in software.
A text-to-speech (TTS) system simply converts text to speech. Many computer operating systems have included speech synthesizers since the early 1990s.
Recent progress in speech synthesis has produced synthesizers with very high intelligibility but the sound quality and naturalness still remain a major problem. However, the quality of present products has reached an adequate level for several applications, such as multimedia and telecommunications.
The following thesis presents a brief overview of the main text-to-speech synthesis problems, and the initial work done in building a TTS in English.
At first sight, this task does not look too hard to perform. After all we all have a deep knowledge of reading rules of our mother tongue. They were transmitted to us, in a simplified form, at primary school, and we improved them year after year. But in the context of TTS synthesis, it is impossible to record and store all the words of the language. Some other method has to be implemented for this purpose. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood. A text-to-speech synthesizer allows people with visual impairments and reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s.
Astro- physician Stephen Hawkins, who is completely paralyzed, gives all his lectures using a TTS system.
Text-to-speech synthesis -TTS – is the automatic conversion of a text into speech that resembles, as closely as possible, a native speaker of the language reading that text. Text-to-speech/ Audio system is the technology which lets computer speak to you. The TTS system gets the text as the input and then a computer algorithm which called TTS engine analyses the text, pre-processes the text and synthesizes the speech with some mathematical models. The TTS engine usually generates sound data in an audio format as the output.
The text-to-speech (TTS) synthesis procedure consists of two main phases. The first is text analysis, where the input text is transcribed into a phonetic or some other linguistic representation, and the second one is the generation of speech waveforms, where the output is produced from this phonetic and prosodic information. These two phases are usually called high and low-level synthesis. The input text might be for example data from a word processor, standard ASCII from e-mail, a mobile text-message, or scanned text from a newspaper. The character string is then pre-processed and analyzed into phonetic representation which is usually a string of phonemes with some additional information for correct intonation, duration, and stress. Speech sound is finally generated with the low-level synthesizer by the information from high-level one. The artificial production of speech-like sounds has a long history, with documented mechanical attempts dating to the eighteenth century.
Speech synthesis can be described as artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diaphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output.
1.2 STATEMENT OF THE PROBLEM
The problem area in speech synthesis is very wide. There are several problems in text preprocessing, such as numerals, abbreviations, and acronyms. This system will help solve the problems of those with learning disabilities, some people have basic literary levels. They often get frustrated trying to browse the internet because so much of it is in text form. People with visual impairment – Text to speech can be a very useful tool for the mild or moderately visually impaired.
Even for people with the visual capability to read, the process can often cause too much strain to be of any use or enjoyment. With text to speech, people with visual impairment can take in all manner of content in comfort instead of strain.
1.3 OBJECTIVES OF STUDY
The main objective of the paper is to design and implement a Text-to-Speech/Audio System. The Speech/Audio system focuses precisely on the following objectives:
– To Design and Implement a Speech synthesizer that converts text to audio.
– To Design and Implement a System that can read out text in any frequency that user specifies.
– To design and implement a speech synthesizer that can read out text in both female and male voices.
1.4 SIGNIFICANCE OF THE STUDY
The significance of this study is:
The application will build a platform to aid people with disabilities especially on reading and also help get information easily without any stress.
The project could also help children learn how to pronounce words and how to read.
The study will serve as a foundation and guide to other research students interested in researching on Text-to-Speech systems.
1.5 LIMITATION OF STUDY
Text to Speech also has limitations. The most obvious drawback of text as a knowledge building and communication tool is that it lacks the inherent expressiveness of speech. When speech is transcribed into text, it loses many of its unique qualities – tone, rhythm, pace and repetition that helps to reduce memory demands and support comprehension. A transcript may accurately record the spoken words, but the strategic and emotive qualities and impact of speech are diminished on the page. Furthermore, the cognitive demands of organizing ideas into acceptable syntax, conventions, and presentational form can pose significant barriers to using text for expression among both novice and expert writers alike. We think in images or word fragments. Ideas float in and out of our heads—and rarely in a linear or conventional way. Writing attempts to shape that free-forming, dynamic process of thought into a single, sequential output of sentences and paragraphs. Some individuals may have all these creative ideas in their imaginative mind, but because their mind is so much quicker and richer than the pen, when they put ink to parchment, the outcome is a blank sheet of paper.
1.6 DEFINITION OF TERMS
Text-to-speech: text to speech application is used whenever there is a difficulty in reading or whenever a reading is not the priority as of the moment.
Synthesized: produce (sound) electronically.
Communication: the imparting or exchanging of information by speaking, writing, or using some other medium.
Transmitted: cause (something) to pass on from one person or place to another.
Reading disabilities: is a condition in which a sufferer displays difficulty reading.
System: a set of things working together as parts of a mechanism or an interconnecting network; a complex whole.
Rhythm: the measured flow of words and phrases in verse or prose as determined by the relation of long and short or stressed and unstressed syllables.