You are on page 1of 5

PROJECTREPORT

VOICECONTROLLEDSWITCHING
ProjectObjective
1. To recognise a voice command( predefined ) with the help ofstandalonesystembased
onATmega328P(Withoutanyprocessororcomputer).
2. Usetherecognisedcommandforswitchinghomeappliancesat240volts.

ProjectOverview
1. Create dictionary of 2 to5 words as wellas letters whichcanbe uniquelyrecognisedby
thesystem.
2. Compare the word spoken by the user with the word stored in dictionary recognise the
wordifmatchisfound.Thenperformtheswitchingassociatedwiththatcommand.

BasicBlockDiagram

Structure

The audio signal isreceived by the microphone andthefrequenciesoutsidetherangeof

50Hz2Khzarefilteredout.
Then this filteredsignal ispassed through amplifier and then amplifiedsothattheoutput
isatlineleveli.e.within0to5V.
This filtered signal is given to arduino as analog input. The arduino has sampling
frequency of 4KHz and 128 samples are recorded (starting from the first sample that
deviatedenoughfromthesilencelevel)oftheword.Thesignalisnowintimedomain.
Now the arduino processes this signal and if the match is found then it makes the
correspondingoutputpinofthearduinohigh.
Thisoutputisgiventothecomparatorwhoseoutputcanbeeither0voltsor12volts.
The comparator output is given toelectromagnetic relay which perform the switchingat
240volts.

AlgorithmandTechniques

To know at what rate sampling should be performed we need to know the Nyquist
theorem which says that the sampling frequency must be greater than twice the
maximum frequency present inthe signal. Ifthiscondition arenotmetthenphenomenon
called aliasing occur and wronginformationaboutthesignalisconveyedratherthanless
information.
We learnt about the calculation of fourier transform and how to convert the signal from
timedomaintofrequencydomain.
Welearntaboutwindowingofthesampleanditspurpose.
We learnt about the calculation of power spectrum and application of Mel Frequency
Filter bank and how to apply them on the log of power spectrum to get the information
storedindifferentfrequencybandwidth.
We learnt the Discrete CosineTransform and the distribution ofMelfilterover frequency
tocalculatethe13MFCCcoefficientstocharacterisetheword.

SpeechRecognitionAlgorithm
TherecognitionsystemisbasedontheMelFrequencyCepstralCoefficient.
StepByStepProcedure
1. Sample the amplified speech signal from the analog input pin of the arduino. Current
sampling frequency used in the ATmega328P is 4 KHz. As the SRAM of the
microcontrollerusedisjust2048Bytes,sowerecordedjust128sampleatatime.
2. After the samplingis done thenHammingwindowisappliedtoeliminatethediscontinuity
inthesignalovertherealdomain.
3. Nowthespeechsignalisprocessedtoremovethediscontinuityattheboundarypoints.
4. The Fast Fourier Transform of this speech signal is calculated. This generates two
arrays of 128 elements each containing real coefficients in one array and imaginary
coefficientsintheother.
5. Now thesquareoftheabsolutevalueofthesecomplexnumberiscalculated.Thisishow
we get thepowerspectrumofthesignal.Theabsolutevaluesofthefouriertransformare

symmetric. Hence at this point we have only 64 bins. These bins contain information
about Fs/2n frequency spectrum. let Fs/n be Df . Now ith bin contains information in the
frequencyrangeofDfiDf/2.
6. Now in order to capture information in lesser numberoftermswedesigned32triangular
mel filter bank. The standard number is 40 but due to very less RAM(2048 bytes) we
wereforcedtouse32filters.
7. Now wehaveinformationstoredinthese32filters.Thisinformationisprocessedthrough
Discrete Cosine Transform. Alternatively Inverse Fouriertransform can be used but this
isless efficientbecauseitgivesoutputintermsofcomplexnumber ontheotherhandthe
DiscreteCosineTransformgivestheoutputintermsoftherealcoefficient.
8. Now we get a 13 element array representing the word. After repeated sampling and
taking mean of the arraysobtained, we get thefingerprintofthewordthatwouldbeused
tocomparetheword.
9. Now when a wordisspoken,thearrayformediscomparedwiththestoredfingerprintsof
thedifferentwords.Assoonasamatchisfoundthewordisrecognised.
10. Now the switching is done by using 12 V DC electromagnetic relays. The output of
arduino is fed to OpAmp( LM324 ) which is used as power driver. The output of the
arduino is given to the NonInverting terminal while the Inverting terminal is at 2.5V. so
when the output from the arduino goes high the OpAmp provides12 volts to the relay.
WhenitgoeslowtheOPAMPoutputgoeslowmakingtherelayoff.
11. Thedeviceisprogrammedinthefollowingway
First some recognizers(less noise sensitive) are added to choose the device. They are
lettersbecausetheyareseldomusedaloneinaconversation.
After recognitionoftherecognizer,a5secondstime window isactivatedwithinwhichthe
on/off command has to be given. Now as the device is already chosen( by the
recogniser) the on/off command given by the user performs the switching action on the
device. An indicator LED shows whether the time window is activated. Such a
arrangement is made to avoid the triggering of the circuit by the normal day to day
conversation.
12. Thecommandsarecarefullychosentoavoidintermatchesbetweenwords.
13. The device works very well in case of the silent environment. But has low accuracy in
noisyenvironment.Alsothewayaparticularwordisspokenalsomatters.

MathematicsInvolved
Euclidian : Summation of the squares of the difference of the corresponding elements of the
fingerprintarrays.Accuratethresholdhasbeendecidedtominimizefalsenegatives.
Fast Fourier Transform : The thingis that every periodicsignalcanbeexpressedasthesumof
sines andcosinesofdifferentfrequency.The fouriertransformisusedtocalculatetheamplitude
ofthesineandcoswavepresentatthatfrequency.

AnalogCircuit:
The analog circuit consist of the band pass filter and amplifier with electret condenser
mic.
The output of the mic is ofthe order of few millivolt which has to be amplified to getan
usefulsignal.
So we used amplifierandasperNyquisttheoremthemaximumfrequencywhichwecan
allow inthesignalshouldbelessthanequaltohalfofthesamplingfrequencysowehave
tousethebandpassfilter.
At input of OpAmp there is capacitor of 4.7uF which is used to bypassthe frequencies
lower than 50 Hz and in the feedback there is capacitor of 12 pF which is used to
attenuatethefrequenciesmorethan2KHz.Thisishowwemadebandpassfilter.
After this in order to amplify thesignal to more extent weusedtheinvertingamplifier.So
that we can voltage signal of comparable magnitude which can be distinguished by the
ADCofATmega328P.
TheSchematicofthecircuitisasfollows.

We did the testingof many preampmodels in the Electronics Laboratoryandafterlots


of trials it started working well. But in order to have more precision we wanted to use
OPA344 instead of LM741 but it was very costly. So we finally bought a readymade
micpreamplifierwithOPA344whichwascheaperthanthecostoftheOPA344itself!

PracticalTesting

First we started with the aim to recognise vowel because they have distinct sound and
also only one typeofsound.Wefoundthefingerprint byrepeatedlysayingthewordmore
than 100 times butwefoundthat30sampleswerealsoenoughtohavea goodestimate
of the mean of the coefficient. And then we decidedto use themeanofeverycoefficient
tousetheminfingerprint.
Once we go beyond recognising one vowel we also kepteyeontheInterword(Euclidian
distance between the test sample of one word and fingerprint of the another word)
distancesothatwecaneliminatethevowelswhichhavesimilarfingerprints.Toavoidthe
possibility of wrong detection among the words in dictionary we carefully chose the
wordsandvowelwhichhavemaximuminterworddistance.
TheEuclidiandistancewhichwedecidedasthresholdis40units.
For final demonstration purpose we used O and Q as recognisers and Activate and
OFFasthecommandstocontroltheswitching.

Accomplishments

Able to switch independently two devices even in routine conversation environment with
goodaccuracyandlowfalsepositives.
Abletooperatequitesuccessfullyevenatthedictionarysizeasmuchasof6words.

FutureProspects

Planning to implement the markov model for the purpose of voice recognition on the
smallsystemlikeArduino.
Trying to eliminate the inability to sample more MFCC for single word so that we can
havebetterrecognitionforlongerwordsaswellasatdifferentamplitude.
Trying toimplementtheDynamicTimeWarpingalgorithmtorecognisethewordevenifit
issaiddifferentlyorforlongerorshortertimethantheprerecordedsamplefingerprint.

Acknowledgment

AmoghGarg(Explainedvariousmethodsandhelpedwiththecode)
MayurNawal

References

PlainFFTlibrarybyDidier

TeamMembers(1RE35)

SanketBarhate
UtkarshPhirke
KartikTidke

You might also like