You are on page 1of 15

DIALECTS ANALYSIS

SPEECH
RECOGNITION .

By
V ishal &
Fundamentals Of Speech
Analysis

Two Categories :-

1. Extrinsic
2. Intrinsic
Extrinsic

Extrinsic factors are based on :-


•Cultural
•Emotional influence
•Level of Education
•Speaker’s state of mind

Intrinsic
Intrinsic factors are based on :-
•Anatomy of speaker.
•Consequence of variation in size & shape
of vocal tract .


??Why Speech Analysis is
Challenging Task??


•Variability of the spectral shapes , even
for the same phoneme makes this class
more spread out.
•Different phoneme uttered by different
speakers typically have same formants.

Representation of
speech
•Speech signal must be converted from
analogue to digital.
•The analogue signal must first be band
limited and then sampled at fixed time
intervals.
•Sampling frequency ranges from 8khz to
48khz.
(we used in our project only 8khz)
•Sampling resolution varies from 8 to 16
bits.



Dialects

Definition
•A dialect is any distinguishable
variety of
language spoken by group of
people.
•dialect of given language are the
difference in speaking style of
particular language because of
geographical and ethnic difference of
speakers.
Examples of dialects
• Chattisgarhi (chattisgarhi
accented hindispoken in central
india ).
• Bengali ( Bengali accented hindi
spoken in Eastern region).
• Marathi (marathi accented hindi
spoken in western region).
• General ( hindi spoken in northern
region).
• Telugu(telugu accented hindi spoken
in southern region).
System
Design

••
•• Test
sample
Feature
Extraction

-
Speech •MFCC Models
•LPC O
signal Model
Feature Extraction construction Model Matching /P
•LPCC
Feature
Extraction
Speech
Waveform
Pre-Emphasis

Time Windowing

Fast Fourier Transformation(FFT)

Mel Scaling

Log

Discrete Cosine Transform(DCT)

MFCC
Feature Extraction
•Input speech signal Digitised at a
sampling rate of 8khz.
•Pre-emphasis filter focus on spectral
properties of a vocal tract.
•Hamming window of 20ms duration used to
extract the speech waveform.
•Fast fourier transform used to computed
the power spectrum of windowed signal.
•Mel scaled band-pass filters is applied
to approximate the frequency resolution.
•Log is applied for compression of trained
speech signal
•Discrete cosine transform is used to compute
the MFCC.
Model Construction
HMM 0
Probability Computation
Probability
Computation

OBSERVATION
SEQUENCE

MFCC
MFCC Select
Feature
feature Probability
Computation Maximum
.
. HMM 2
.

Probability
Computation
HMM 9
Model
Construction Example
Conclusion

-This specify the spectral features


extracted from speech were explored for the
identification of dialects.

You might also like