Professional Documents
Culture Documents
Speech
A PROJECT REPORT
Submitted by
ARIJIT SAMANTA (21204106005)
in partial fulfilment for the award of the degree
of
BACHELOR OF ENGINEERING
in
APRIL 2008
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
______________________ ______________________
SIGNATURE SIGNATURE
Mr SUMAN MISHRA Mr SUMAN MISHRA
( HEAD OF THE DEPARTMENT) (SUPERVISOR)
DEPARTMENT OF ELECTRONICS AND DEPARTMENT OF ELECTRONICS AND
COMMUNICATION ENGINEERING, COMMUNICATION ENGINEERING,
RAJIV GANDHI COLLEGE OF RAJIV GANDHI COLLEGE OF
ENGINEERING ENGINEERING
SRIPERUMBUDUR 602 105 SRIPERUMBUDUR 602 105
_________________________ _________________________
system which would save the time, labour and moreover the system would be
immune to impersonation.
individuals. The challenges faced were the detection of voice patterns and
deal with the limited memory and processing power, we had to remove the
the ambient noise and calculating for a threshold level based on the same.
The process of detection of voice takes place in two phases. In the first phase
the sample of speech from the individual is taken; the Voiceprint computed and
saved in the database. In the second phase the speech of the individual is
The implement was successful for a sample space of three Voiceprints. The
The Block Diagram of the System
1
2
1 . Introduction
The data acquisition module consists of a microphone. The signal from the
microphone is then fed to the amplifier to match the ADC input voltage
requirements in the microcontroller.
Microphone:
The template generation method involves two steps. The first step is the
calculation of the ambient noise threshold to eliminate noise to a large extent.
The second step is to compute the voiceprint of the speech (the word spoken by
the user for identification). The generated template is then transferred to the
computer. The template is then used to program the microcontroller.
Summary:
Hence we have developed a method to remove the redundancies and retain just
the optimum amount of data to effectively recognize voice.
Figure 1‐Flowchart for Template Generation (160 point data)
3. Bandpass filtering: Next we filter the signals using bandpass filters e.g.
filters are designed with cut-off frequencies from 100Hz-200Hz, 200Hz-
300Hz .... and so on to 1900Hz-2000Hz.
4. Accumulation of the output of bandpass filter: The filter output is then
accumulated and saved to a 160 point vector.
5. Transferring the template data: The 160 point vector data is the
template, which is very small in size but retains enough information to
distinguish voices of different persons. The template data is transferred to
the PC.
Hence we need sampling at the rate of at least 4000Hz . Hence the sampling
time should be ideally 1/4000sec = 250µs.
The voice sampling is achieved by setting the ADC control registers and a timer
which interrupts (triggers) the microcontroller to generate ADC data every
232µs.
Algorithm :
We sample the ambient sounds i.e. not speaking anything on the microphone.
The microphone records noise and the noise characteristics (amplitude
information and phase information) are determined and then this is used to
remove the noise adaptively when human voice is sampled. This is done by
registering samples only if they are above the threshold of noise value.
Figure 2‐ Adaptive Noise Cancellation
9
10
IIR filters may be implemented as either analog or digital filters. In digital IIR
filters, the output feedback is immediately apparent in the equations defining the
output. Note that unlike with FIR filters, in designing IIR filters it is necessary
to carefully consider "time zero" case in which the outputs of the filter have not
yet been clearly defined.
Example IIR filters include the Chebyshev filter, Butterworth filter, and the
Bessel filter.
11
This is an important part for generating a voice template for the voice. This step
removes the redundancies in voice and stores the signature in a 160 point vector
data.
The bandpass filter is a Second order Chebychev IIR filter. The coefficients for
the filter are calculated using MATLAB.
Figure 3‐Transposed‐Direct‐Form‐II implementation of a second‐order IIR digital filter (input on the right, output on the left)
The assembly language code is then written to implement the filter is written
taking care that the filter is able to calculate within 2100 system cycles that is
before the next sample arrives. Hence to optimize the following process we
have optimized the data format from float to fixed point `see APPENDIX I for
more details ` 2’complement form which has improves the performance of the
program and helps to compute within the required number of system clock
cycles.
The gain for the passband is 20dB and the rolloff is quite steep as two IInd Order
Chebychev bandpass filter are cascaded in series.
13
Figure 4 ‐ Signal Processing Block Parameters for designing the Filter.
14
Figure 5 – Window showing the Designed Filter Coefficients.
15
#define int2fix(a) (((int)(a))<<8) //Convert char to fix. a is a char
#define fix2int(a) ((signed char)((a)>>8)) //Convert fix to char. a is an int
#define fix2uint(a) ((unsigned char)((a)>>8)) //Convert fix to char. a is an int
#define float2fix(a) ((int)((a)*256.0))
#define fix2float(a) ((float)(a)/256.0)
*Details in Appendix 1
The optimized filter algorithm such that all the filter activities are completed
before the next sample arrives from the ADC i.e roughly 2000 system cycles.
The Algorithm:
10. The fixed point output of the bandpass filter is accumulated(summed up)
according to the process given in Chapter 2.4.
Figure 6‐ Frequency Response of the digital filter for 400‐600 Hz BANDPASS
17
The output of the bandpass filters is summed to get the 160 point vector data
which is in fixed format. It is then converted to 16 bit integer format.
1. The first output from a bandpass filter is obtained and stored in a register.
2. The subsequent outputs from the same bandpass filter are added to the
register where the first value of the same bandpass filter is stored.
3. The above step is repeated for the rest of the bandpass filters.
4. The fixed point data is then converted to 16 bit integer format .
5. The sampled time i.e the time required to sample a spoken word is
divided into 20 parts in time. Hence we obtain 160 (20 X 8 = 160) 16 bit
data vector which is the template.
18
Figure 7‐ The Fourier Spectrum of the word “Hello”
Figure 8‐ The Fourier Spectrum of the word “Hello”
.
19
The Voiceprint is the 160 point data obtained from the sampled speech. The
method used is exactly the same as the generation of 160 point template.
20
Once the voice print is obtained the Voiceprint is compared against the stored
templates.
The Euclidean distance is found between the Voiceprint and the templates using
the Euclidean distance formula.
Euclidean Distance:
Where i = 1 to 160
Now we have the Euclidean distance calculated for suppose 5 template data
vectors, we analyse that which data vector has the minimum Euclidean distance.
The minimum Euclidean distance Vector is then chosen as the detected word
and appropriate action taken.
The match is obtained from the speech/voice comparison block from the
previous chapter. The required action is then taken according to the control
block.
The control block is a simple switch case where actions are determined
according to the template with the minimum Euclidian distance.
Figure 9‐ The Hardware Schematic.
Figure 10 Pinouts ATmega32 microcontroller (PDIP)
24
General features
¾ High-performance, Low-power AVR® 8-bit Microcontroller
¾ Advanced RISC Architecture
¾ Nonvolatile Program and Data Memories
¾ In-System Programming by On-chip Boot Program
¾ True Read-While-Write Operation
¾ 1024 Bytes EEPROM ,Endurance: 100,000 Write/Erase Cycles
¾ 2K Byte Internal SRAM
¾ Programming Lock for Software Security
Peripheral Features
¾ Two 8-bit Timer/Counters with Separate Prescalers and Compare
Modes
¾ One 16-bit Timer/Counter with Separate Prescaler, Compare
Mode, and capture Mode
25
Pin description
Pins Description
GND Ground
Features :
¾ 10-bit Resolution
¾ Up to 15 kSPS at Maximum Resolution
¾ 8 Multiplexed Single Ended Input Channels
¾ 7 Differential Input Channels
¾ 2 Differential Input Channels with Optional Gain of 10x and 200x(1)
¾ Optional Left adjustment for ADC Result Readout
¾ 0 - VCC ADC Input Voltage Range
¾ Selectable 2.56V ADC Reference Voltage
¾ ADC Start Conversion by Auto Triggering on Interrupt Sources
Figure 11 ADC Timing Diagram, First Conversion (Single Conversion Mode)
28
Figure 12 ADC Multiplexer Selection(ADMUX)
Figure13 ADC Control and Status(Register A – ADCSRA)
Figure 14 Pin Outs of MAX 232
Applications
¾ Battery-Powered Systems
¾ Terminals
¾ Modems
¾ Computers
32
Figure 15 Pin outs of LM 358 chip
Features of LM 358
Description :
VI = +_10 V
Table 2 Testing condition of LM 358
34
3 Results
Since we had to pass the ADC output through all of the filters faster than our
sample time; the time it took do all the filter calculations was very important.
We were able to run through 9 filters in under 4000 cycles, which is the amount
of cycles available when sampling from the ADC at 4 KHz. The fingerprint
comparison function did not have a speed requirement and so the cycle time for
that was unimportant. The program was able to recognize five words, but
sometimes it would become confused and match the incorrect word if the word
that was spoken varied too much from the word stored in the dictionary. As a
rough estimate the program recognized the correct word about 70% of the time
a valid word was spoken. The program achieved success using Arijit’s voice,
and with sufficient practice a person could say the same word with a small
enough variation for the program to recognize the spoken word most of the
time. For the general person though the recognition program would have a
much lower percentage of success. Also the words in the dictionary are words
spoken by only one person. If someone else said the same words it is unlikely
the program would recognize the correct word most of the time, if at all.
35
Words Percentage accuracy
Arijit 95%
Turbo 92%
Mony 97 %
Table 1‐ Accuracy of recognition
To increase the accuracy we have taken the template sample 20 times and
calculated and computed the geometric mean of the vectors.
Screen Shots:
On starting the Template Generation process the program initializes the required
variables and performs the measurement of Noise Threshold.
Starting ....
Noise Measurement Done ......!
36
Starting ....
Noise Measurement Done ......!
Sampling Started...!
Noise1 = 22345
Noise2 = 21367
Noise3 = 20456
Threshold = 21498
0
234
123
6783
The output screen when the noise threshold calculation and the sampling is
completed and the program starts to generate the 160 point template
vector.
37
Starting......
GetSample.....
Recognized Voice is of ‘Turbo’
GetSample
Recognized Voice is Of ‘Mony’
GetSample
Recognized Voice is of ‘Arijit’
The voice detection screen where it shows when the sampling starts and
then shows the match.
38
APPENDIX 1
Floating point arithmetic is too slow for small, 8-bit processors to handle,
except when human interaction is involved. Scaling a human input in floating
point is generally fast enough (compared to the human). However in fast loops,
such as IIR filters or animation, you are going to need to use fixed point
arithmetic. Numbers are stored in 2's complement form.
This section will concentrate on numbers stored as 16-bit signed ints, with the
binary-point between bit 7 and bit 8. There will be 8 bits of integer and 8 bits of
fraction, so we will refer to this as fixed point. This representation allows a
dynamic range of +/- 127, with a resolution of 1/256=0.00396. Sign
representation will be standard 2's complement. For instance to get the fixed
point representation for -1.5, we can take the representation for +1.5, which is
0x0180, invert the bits and add 1. So inverting, we get 0xfe7f and adding one
(to the least significant bit) we get 0xfe80.
Table 4 Example showing the fixed to float conversion example values
BASIC ALGORITHM:
REFERENCES:
2. http://instruct1.cit.cornell.edu/courses/ee476/FinalProjects/s2006/avh8
_css34/avh8_css34/index.html
http://instruct1.cit.cornell.edu/courses/ee476/Math/index.html