Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: Obstructive Sleep Apnea (OSA) is a serious sleep disorder where patient experiences frequent upper
Received 6 March 2015 airway collapse leading to breathing obstructions and arousals. Severity of OSA is assessed by averaging
Received in revised form 7 October 2015 the number of incidences throughout the sleep. In a routine OSA diagnosis test, overnight sleep is broadly
Accepted 17 November 2015
categorized into rapid eye movement (REM) and non-REM (NREM) stages and the number of events are
Available online 17 December 2015
considered accordingly to calculate the severity. A typical respiratory event is mostly accompanied by
sounds such as loud breathing or snoring interrupted by choking, gasps for air. However, respiratory
Keywords:
controls and ventilations are known to differ with sleep states. In this study, we assumed that the effect
Obstructive sleep apnea
Non-rapid eye movement
of sleep on respiration will alter characteristics of respiratory sounds as well as snoring in OSA patients.
Rapid eye movement Our objective is to investigate whether the characteristics are sufficient to label snores of REM and NREM
Snoring sleep. For investigation, we collected overnight audio recording from 12 patients undergoing routine OSA
Sleep disorder diagnostic test. We derived features from snoring sounds and its surrounding audio signal. We computed
time series statistics such as mean, variance, inter-quartile-range to capture distinctive pattern from REM
and NREM snores. We designed a Naïve Bayes classifier to explore the usability of patterns to predict
corresponding sleep states. Our method achieved a sensitivity of 92% (±9%) and specificity of 81% (±9%)
in labeling snores into REM/NREM group which indicates the potential of snoring sounds to differentiate
sleep states. This may be valuable to develop non-contact snore based technology for OSA diagnosis.
© 2015 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.bspc.2015.11.007
1746-8094/© 2015 Elsevier Ltd. All rights reserved.
S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142 131
position [9]. Therefore separate sleep specific severity index pro- to acquire simultaneous PSG and non-contact SBS recording and
vides further details about sleep quality, which is unavailable via diagnostic information from hospital. PSG based sleep staging is
overall AHI and AI parameters. our reference to characterize REM and NREM snores. We trained
OSA is prevalent in general population and at least 80% of a model with REM/NREM snores from our database and cross-
the middle-aged adults with moderate to severe OSA remained validated the model iteratively by leaving-one-out technique.
undiagnosed [10]. Considering the spectrum of undiagnosed popu- It was generalized for entire patient population. We computed
lation, associated co-morbidities and timely access to the facilities, standard matrices such as sensitivity, specificity, accuracy, positive
there is growing interest among researchers to develop alterna- predictive value (PPV) and negative predictive value (NPV) to com-
tive techniques for mass screening. In general, devices targeted pare method performance with previous researches. As our method
for population screening does not consider MSS scoring due to is solely depend on sounds, it can be easily integrated into snore
its time-consuming, costly, laborious intensive nature, complexity based automated techniques developed for OSA screening both in
of instrumentation and contact sensors for multiple physiological contact and non-contact nature.
signals.
In order to develop alternatives, snoring in OSA patients has 2. Methodology
gained attention to researchers because of its non-contact instru-
mentation, cheap and easy to access nature. Snoring is the earliest Fig. 1 represents the overall methodology proposed in this
symptom of OSA. Snoring originates from vibration of soft tissues paper. A detail of our method is described in following Sections
(e.g. tongue, soft palate, pharyngeal wall) in UA [11] while OSA 2.1–2.5.
results from UA collapse. Loud breathing or snoring followed by
period of silence and then sudden gasps for air are common con- 2.1. Data acquisition protocol
comitant events in OSA. Hence, sounds during respiration such as
snoring, loud breathing in OSA patients should carry vital infor- Data acquisition environment for the work of this paper was
mation about UA patency which may be valuable to develop OSA Sleep Diagnostic Laboratory of The Princess Alexandra Hospital,
screening techniques. Brisbane, Australia. Both oral and written consent from the patients
In the context of sounds, variation in snore sound properties was collected according to the approval of the human ethics
(formants [12], periodicities [13], intervals [14] and Gaussian- committees of Princess Alexandra Hospital and The University of
ity [15]) were studied to characterize OSA patients. Indeed, Queensland. Our subject population includes patients referred to
these efforts indicate potential of snoring sounds for population the hospital for a routine Polysomnography (PSG) test. Routine
screening. However, sleep related variation in UA muscles should PSG recordings were made using clinical PSG equipment (Siesta,
cause alteration in acoustical properties of UA (and hence the Compumedics® , Sydney, Australia). Typical PSG recording followed
sounds of respiration). Very few studies [16,17] have attempted
to explore sounds for MSS specific information which may provide
further details about OSA. PSG Snore
Snore sounds in NREM sleep were described as intense and data Sound Recording
longer than those in REM sleep for OSA patients [17]. However, no
definitive framework was proposed in [17] for the usage of such
variations to extract REM/NREM states. The pioneering work in Snore Episode Detection
[17] was limited to a presentation of descriptive statistics of snores
Wake / Sleep (REM / NREM) Epochs
from known sleep states and findings were not validated on a new
dataset. Later, REM/NREM differentiation were found to be only REM/NREM Labelling
64% achievable using patient specific models [16]. These efforts
indicate possibility of MSS related information in sleep sounds.
Sleep Staging
Snoring/Breathing
Non-Gaussianity)
Group of Snores
Features from
Features from
Table 1
Demographic details of the patient population.
Patient ID Age BMI (kg/m2 ) RDI REM AHI NREM AHI Wake/sleep epochs NR/R epochs No. of SS
BMI = body mass index, AHI = apnea-hypopnoea index, NREM AHI = AHI from NREM sleep, REM AHI = AHI from REM sleep, wake/sleep epochs = No. of awake epochs/No. of
sleep epochs, NR/R epochs = No. of NREM/No. of REM epochs, No. of SS = No. of snore samples.
2.2. Snore episode detection and labeling 1. Episode SkE must reside in Eth epoch of W. Eq. (1) indicates the
location of E within W.
We followed an automatic snore detection algorithm previously ⎧
⎪ (N − 1)
developed by our group in [13] to locate snore segments in SBS ⎨ + 1, odd
recording. Algorithm uses an objective definition of snore episode 2
E= (1)
based on the evidence of sound pitch. The overall accuracy of auto- ⎪
⎩ N
, even
matic snore detection algorithm was reported to be 95.47% in [13]. 2
To further validate performance of snore detection algorithm in our
2. N contained only REM and NREM epochs. Any window with
dataset, we manually screened first 100 snore samples detected
awake epochs was omitted from further investigation to avoid
by the algorithm by carefully listening to the events and simulta-
sleep/wake transition effect.
neously looking at the time scale and spectrogram on a computer
screen, from each patient. A event was labeled ‘True Positive’ if it
For instance, in Fig. 2, we designed a window W = 6 min around
satisfied the definition of snore given in [13] else ‘False Positive’.
an unknown episode S4109 (4th snore within epoch E = 109 marked
Then we computed the accuracy of algorithm as ratio of total num-
ber of ‘True Positive’ overall total number of event, i.e. 100. Accuracy as s4). W consisted with N = 12 consecutive epochs around S4109
of algorithm was computed individually for each patient. The mean where E located at 6th. Our target is to investigate the 6-min audio
accuracy of the snore detection algorithm was found to be 89 ± 14%. recording in Fig. 2(b) and derive feature set to decide S4109 as REM
For 9 subjects accuracy was >90%. or NREM snore.
Snoring episodes were then aligned with PSG to locate cor-
responding epochs of PSG recording. If misaligned, we deducted 2.4. Characteristic feature set design and extraction
recording segment which started earlier. We maintained time of
PSG epochs as reference for alignment. We anticipated that an We derived two groups of features for SkE . Group 1 is from snor-
epoch may have several snoring episodes. Hence same epoch label ing sound properties itself and Group 2 from surrounding signal
will be shared among them. For instance if NREM Epoch E contains pattern (e.g. effort, intensity, periodicity and intervals). Total 43 dif-
n number of episodes, kth episode SkE within E was labeled as NREM ferent features were extracted from Group 1 and Group 2. Table 2
(k = 1, 2, 3, . . ., n). summarizes number of features derived from each group.
S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142 133
Fig. 2. A 6 min segment of audio recording from a 50 years-old male patient (BMI = 36.1 and RDI = 59.1) in our database. The segment was selected to demonstrate a time
window W = 6 min from N = 12 consecutive epochs around the snore episode S4109 (4th snore from the epoch E = 109) indicated as s4. Panels in this figure depict (a) epochs of
sleep stages from PSG recording, (b) audio signal (mV) from microphones and (c) 30 s of audio signal (mV) segment enlarged from the 6-min window containing the episode
s4 and its nearby episodes (s1–s8). Symbols W, N1, N2, N3, N4 and R in (a) represents respectively the Awake, NREM stage 1, NREM stage 2, NREM stage 3, NREM stage 4 and
REM sleep stages.
2.4.1. Group 1: Features from the group of snores window around SkE and computed the following features from the
REM sleep in humans is described as irregular breathing events group.
with increased eye-movement activity [18] in association with
reduced tidal volume and minute ventilation [19]. Reduction in 2.4.1.1. Descriptive features. Total number of snores within the
tidal volume and minute ventilation represents a reduction in window (TotSnE) was our first feature. Snoring habit may vary
volume of air to the lungs during respiration. We anticipated across the patients; therefore we normalized TotSnE for each
that reduced air volume to the lungs in REM sleep may have patient using its median. Next we computed snore-to-snore time
linked with reduced activation of UA muscles [4,5,7,22] which interval (TI) in seconds within window. (i) Mean, (ii) inter-quartile
may effect on respiratory sounds (and hence snoring). From this range (IQR) and (iii) coefficient of variation (CV) of TI as our next
assumption, we intend to examine properties of consecutive snore features. We computed log energy of snores (log E) using Eq. (2). We
episodes. We formed a group from episodes by considering a 1 min derived energy per unit length (EPL) for each episode by Eq. (3). (i)
134 S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142
Mean, (ii) IQR and (iii) CV of EPL were our last set of descriptive yj (i). We measured L in seconds and converted into per minute by
features. Eq. (5). We called it breathing pseudo-periodicity (BPP).
K
S E (n)2 Fe × 60
n−1 k BPP = (5)
log E = 10 log10 ε + (2) L
K
where Fe indicates the sampling rate of time series yj (i) and 60
log E is used for seconds to minute conversion. We included statisti-
EPL = (3) cal properties of BPP—(i) mean, (ii) IQR, (iii) CV, (iv) kurtosis, (v)
T
skewness and (vi) variance into our feature set.
In (2), K indicates time domain audio signal samples within SkE .
ε is an arbitrary constant to avoid incidence of log 0 computation. 2.4.2.2. Breathing effort and intensity. We anticipated that shape
The symbol T in Eq. (3) is used to indicate length of a snore episode and location of first peak in autocorrelation sequence R[i] will hold
in seconds. information about signal power at an interval of rhythm period.
Hence peak at L (Fig. 3) holds information about intensity and effort
2.4.1.2. Formant features. Formants are the resonance frequencies of neighboring breathing/snoring events. If intense and less vari-
of vocal tract. Size and shape of acoustic spaces in vocal tract and able, R[L] tends to be relatively high. We named peak height R[L]
their coupling indicate corresponding formants. Formant 1 (F1) is as Breathing Intensity (BI). Statistical properties of BI: (i) mean, (ii)
known to represent pharyngeal constriction [23]. In order to study IQR, (iii) CV, (iv) kurtosis, (v) skewness and (vi) variance were com-
REM related changes in UA pharyngeal region [4,5,22], we com- puted from all the segments yj (i) within W and included into our
puted features from F1 of snores. feature set.
Snore episodes were segmented into 100 ms in length and Area between straight lines connecting center of R[i] to first peak
F1 was computed from each segment. We used linear predictive at L in Fig. 3(c) represents magnitude and nature of rhythm at a dis-
coding (LPC) scheme based on the Yule–Walker autoregressive tance L relative to the zero-lag auto-correlation. Variation in energy
parameter estimation [24]. Mean of F1 (mF1) and standard devi- of neighboring breathing/snoring events in yj (i) within L will affect
ation of F1 (stdF1) was computed from each episode. Next the the areas enclosed by peaks at 0 and L. breath-by-breath change
following formant features were computed from all episodes causes change in height and location of peak at L while periodic
within the window—(i) mean of mF1, (ii) standard deviation of mF1, appearance causes the opposite. Fig. 3(c) shows the enclosed areas
(iii) mean of stdF1, (iv) standard deviation of stdF1, (v) IQR of mF1 shaded as A1 and A2. A1 indicates area above the curve of R[i] within
and (vi) CV of mF1. the peaks at 0 and L connected by a straight line. A2 indicates area
under the curve of R[i] between 0 to L. Eqs. (6) and (7) provide math-
2.4.2. Group 2: Features from the breathing/snoring pattern ematical details used for A1 and A2 calculation. Symbol m in Eq. (6)
We investigated nature of airflow pattern from continuous indicates slope of the straight line drawn between R[0] and R[L].
audio signal x(n) within the window W around SkE by converting
L
it into corresponding energy signal. We carried on the following A1 = (mi + 1 − R[i]) (6)
computations on x(n) prior to investigation.
i=0
within 8000 Hz. Hence, we down-sampled x(n) from 44,100 Hz A2 = R[i] (7)
to 16,000 Hz and band pass filtered (300–7500 Hz). i=0
2. Filtered x(n) was segmented into 50 ms blocks and log Energy
We named area A1 as breathing effort (BE) and combination of
was computed for each block using Eq. (4). Resultant is the
A1, A2 and peak height R[L] as breathing effort and intensity (BEI).
energy signal y(i) with sampling rate (Fe) of 20 samples/s.
BEI was calculated using by the following equation:
M
n−1
x(n)2 A1
y(i) = 10 log10 ε + (4) BEI = R[L] × (8)
M A2
The following 6 statistical properties of BE and BEI were included
where y(i) is the log energy of ith block of signal x(n) within W in our feature set: (i) mean, (ii) IQR, (iii) CV, (iv) kurtosis, (v) skew-
containing M samples. ε is an arbitrary constant for any compu- ness and (vi) variance within W. A graphical representation of BPP,
tation of log 0. BI, BE and BEI is provided in Fig. 4. It shows distribution of these
3. Next we normalized y(i) using its RMS value. features within a window W = 6 min around snore episode s4 in
4. Normalized y(i) was segmented into 30 s with 20 s overlap. A Fig. 2.
segment yj (i) from jth segment of y(i) will contain 600 samples
(Fe × 30 s = 600 samples). It was set to observe breath-by-breath 2.4.2.3. Breath-to-breath variability. We anticipated that if breath-
variation within y(i). We assumed that two consecutive breaths ing/snoring sounds within SBS signal appear periodically, then
cannot be apart for more than 10 s unless an apnea event. So seg- signal distribution will not be Gaussian. Breath-to-breath variation
ments for 10 s interval should provide breath-by-breath details cause change in periodic pattern and hence Gaussianity deviates
of y(i). Next we gathered all yj (i) within W to study the pattern accordingly. We used an index called Non-Gaussianity Score (NGS)
of energy signal and derive potential features. [15] to measure amount of deviation of yj (i) from Gaussian distri-
bution. We used normal probability plot to compare deviation from
2.4.2.1. Pseudo periodicity. We obtained an estimate of breath- theoretical linear shape of the plot for data within yj (i).
ing/snoring rate from segments yj (i) using ‘auto-correlation with Prior to NGS computation on the segment yj (i), data was
Centre clipping’ technique [25]. It is a classical method mostly used smoothened and squared as described in Fig. 5(a) and (b). Next a
in snore and speech processing [13] to detect periodicity within sig- horizontal straight line was drawn at highest magnitude of yj (i). It
nal. Auto-correlation sequence R[i] for segment yj (i) is presented in was marked as MaxLine and shown on Fig. 5(b). In this figure, dis-
Fig. 3. First peak at L in Fig. 3 indicates basic rhythm period within tance from yj (i) to MaxLine is shaded as A1 and magnitude of yj (i)
S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142 135
Fig. 3. This figure shows a 30 s segment of (a) audio recording (mV) from microphone x(n), (b) corresponding normalized log energy yj (i) and (c) autocorrelation sequence
R[i]. s1 to s8 in panel (a) indicates 8 consecutive snoring episodes. Shaded area A1 in panel (c) indicates the area enclosed by the straight line connecting the start of R[i] to
the first peak of autocorrelation R[L] at L minus the area under the curve of R[i]. Area A2 is the area under the curve of R[i] starting from i = 0 to L.
Fig. 4. This figure demonstrates the distribution of (a) BPP, (b) BI, (c) BE and (d) BEI derived from auto-correlation of energy signal yj (i) within a window W = 6 min. The
window was designed around the snore episode S4109 from epoch E = 109. Dotted line in the figure indicates the location of the episode and corresponding epoch (6th epoch)
within the window W = 6 min (t = 2769 to 2799 s of PSG recording).
136 S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142
Fig. 5. An example of 30 s segment of energy signal y(i) for NGS calculation. (a) Nor- In Eq. (10), the summation indicates all possible values yj of
malized log energy yj (i) for jth segment of y(i), (b) yj (i) after squared and smoothed dependent variable Y. Now conditional independence criterion of
and (c) ratio of A1/A2 for every data point in yj (i). MaxLine in (b) is the line drawn features f1 ,f2 ,. . .,fF of Naive Bayes incorporated into Eq. (10) as fol-
from the location of highest magnitude of squared and smoothed yj (i). A1 in (b) is
lows:
the distance between the MaxLine and the curve. A2 is the distance between the
x-axis and the curve. Pr (Y = yk ) ˘i Pr (fi |Y = yk )
Pr Y = yk |f1 ,f2 ,...fF = (11)
j
Pr Y = yj ˘i Pr fi |Y = yj
is shaded as A2. The ratio of A1/A2 for every data in yj (i) (presented
in Fig. 5(c)) was used to measure its NGS j . Computation in Eq. (11) indicates that the model calculates pos-
P 2
terior probability Y of an unknown episode SkE as ‘one’ (Y = 1) for REM
i=1
ı[i] − ı class and ‘zero’ (Y = 0) for NREM class based on features f1 ,f2 ,. . .,fF
j =1− P 2
(9) derived in Section 2.4. Decision for SkE will be the class with highest
i=1 ([i]
− )
posterior probability.
In Eq. (9), the symbol P is used to indicate the length of data
within yj (i), ı is the reference normal data from true Gaussian prob- 2.5.2. Artificial Neural Network classifier
ability plot of linear shape and is the probability plot of empirical We designed 3-layer feed-forward ANN classifier model
dataset yj (i) in Fig. 5(c). We computed: (i) mean, (ii) IQR, (iii) CV, (iv) consisted of an input layer, a hidden layer and an output layer to
kurtosis, (v) skewness and (vi) variance of j for each yj (i) within estimate the probability of an unknown episode SkE being a REM or
the window W. NREM snore. Numbers of neurons were: input layer = Hip, hidden
layer = H1 and output layer = Hop. Hyperbolic tangent sigmoidal
2.5. REM/NREM classification model and cross-validation transfer function was used in hidden layer, as it is well known for
nonlinear, smooth and saturating nature especially in application of
We have computed total 43 features in Section 2.4. These were probability estimation and a linear transfer function at the output
derived from audio signal within W which was windowed around layer. Network was trained using scaled conjugate gradient back-
episode SkE in Section 2.3. We used them to label the unknown propagation training algorithm and mean square error was used as
episode SkE as NREM snore or REM. We designed two different types the performance estimation function.
of classifier to assess the performance of our NREM and REM snore For ANN training using LOOCV, 70% of the randomly selected
detection algorithm. In this paper we investigate the use of a lin- snores from Z − 1 patients were used for ANN training, 30% of the
ear Naïve Bayes (NB) classifier for REM/NREM classification. To test randomly selected snores from Z − 1 patients were used for ANN
whether a non-linear classifier can provide a better classification validation and all the snores from 1 left out patient were used
performance we use Artificial Neural Network (ANN). for ANN testing. Training of the ANN was stopped if one of the
To train and test the classifiers we applied leave-one patient-out following stopping criteria were met: (i) minimum gradient mag-
cross-validation (LOOCV) technique. LOOCV considers data from all nitude in training set reached below 10−6 , (ii) maximum number
the patients except one to train the model and one (left out) to test of validation failures (increase in MSE error in validation dataset)
the model. It is systematically repeated 12-times such that each reached to 6 (iii) Maximum number of training iteration = 1000.
patient in the database (Table 1) is used to test the model exactly In order to ensure fast training, uniform learning and even distri-
one time. bution of active regions of layer neurons in input space, we used
We considered PSG based sleep staging as reference for label- Nguyen–Widrow initialization to calculate initial weight and bias
ing snores into REM or NREM group. Outcomes of classifiers were values. Weight and bias terms during training were updated by
initially categorized as positive when Y is close to ‘one’ (REM class) scaled conjugate gradient optimization.
and negative when Y is close to ‘zero’ (NREM class). These were The ANN model were trained to output ‘1’ if the snores were
then compared with corresponding reference classes to identify from REM class and ‘0’ if from class NREM. Hence the ANN output
number of true positives (TP), false positives (FP), true negatives varied between 0 and 1. We used the Receiver-Operating Curve
(TN) and false negatives (FN). We used these numbers to calculate (ROC) to find the optimal decision threshold Ths to classify snores
performance matrices such as sensitivity, specificity, accuracy, pos- into REM or NREM. During the LOOCV process optimum Ths was
itive predictive value (PPV) and negative predictive value (NPV) of calculated from training set and applied on the testing set.
S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142 137
Fig. 6. This figure demonstrates the distribution of (a) mean, (b) IQR, (c) CV, (d) kurtosis, (e) skewness and (f) variance features of BI from all the REM (dashed) and NREM
(solid) snore episodes against their density of occurrences. These statistical features were derived for a window size W = 6 min.
In this paper, we investigated total 34,148 samples of snore To this point, we obtained total 43 features for REM or NREM
episodes (4108 REM samples and 30,040 NREM samples) from snore samples from each time window W designed around them
Z = 12 patients in Table 1. We derived 43 different features for where W varied from 2 to 7 min. Fig. 9 demonstrates the impact
each sample. For feature extraction, we followed window design of variation of W = 2–7 min on empirical distribution of features
method described in Section 2.3. Window varied from W = 2–7 min (mean, IQR, CV, kurtosis, skewness and variance of BI) and corre-
by considering N = 4–14 continuous epochs of sleep to study cor- sponding difference between REM and NREM group. To investigate
responding audio recording around each sample. Overall studied discriminative power and significance of individual features and
audio recording length around the samples corresponds to total to reduce feature dimensionality, we carried statistical test on fea-
5382 REM/NREM epochs. tures. We used non-parametric two-sample Kolmogorov–Smirnov
test (KS-test) to check whether difference in feature distributions
from NREM and REM snores have significance with p < 0.05. Here,
3.1. Analysis of features a non-parametric test was chosen because histograms in Figs. 6–9
showed that our feature set followed skewed distribution where a
We presented Figs. 6–8 in this Section to demonstrate differ- non-parametric test is known to be more applicable. Table 3 lists
ences in REM and NREM snore features from Breathing Intensity number of features selected after KS-test for each time window W.
(BI), Breathing Effort (BE) and Breathing Effort and Intensity (BEI) According to feature selection in Table 3, variation of time win-
for W = 6 min. It is noted from Fig. 6 that mean of BI in NREM snores dow W affects selection from Group 2 (varied between 15 and 22
has higher magnitude with lower IQR and CV compared to REM features) while for Group 1 selection remained unchanged with W.
snores. It indicates taller first peak in autocorrelation with less vari- Next in Section 3.3, we explored significance of time window W
ation in NREM group which reveals intense snoring/breathing in in selection of feature set by measuring classification performance
audio data. of REM and NREM snores. We ran two separate studies to obtain
Histograms in Fig. 7(a) depicts that differences in BE (mean) best performing feature set for optimum time window W.
between NREM and REM group is significant (ranging from 20–40
vs. 10–30). Higher magnitude in BE (mean) with comparatively 1. Fixed feature set with variable time window W—We picked fea-
low IQR (Fig. 7(b)) and CV (Fig. 7(c)) dictates that large area ture set selected for W = 2 min in Table 3 as fixed feature set Fs1.
spanned by A1 between the two peaks (peak at zero lag auto- Then varied only W from 2 to 7 min.
correlation and first peak in Fig. 3(c)). Area above the curve 2. Variable feature set with variable time window W—We started
A1 may appear large from sharp peaks in autocorrelation R[i] with selected feature set Fs2 from Table 3 for W = 2 min. Then
which indicates strong and repetitive signal pattern around NREM Fs2 was updated for each W from 2 to 7 min with lists presented
snores. in Table 3.
Fig. 8 shows that BEI of NREM and REM snore group has
differences in mean, CV and skewness. NREM group has mean 3.3. Classification and cross-validation
concentrated between 1 and 2 while REM is between 0 and 1.
Higher magnitude in BEI indicates wider area above the curve A1 3.3.1. Results from Naïve Bayes classifier
compared to area below the curve A2 between auto-correlation In order to train NB model, we labeled features from REM snores
peaks which reveals nice oscillating pattern in autocorrelation. This with ‘1’ or ‘Positive’ outcome and features from NREM snores with
pattern can arise when underlying audio signal contains strong a ‘0’ or ‘Negative’ outcome. Histograms in Figs. 6–9 dictate that our
and repetitive sound events. Variation between snoring/breathing feature set contained skewed distribution. We used ‘kernel’ den-
affects the pattern by reducing A1/A2 which is prominent in REM sity estimate for each NB model as it is known suitable for skewed
group. feature distribution.
138 S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142
Fig. 7. Histogram of features derived from (a) mean, (b) IQR, (c) CV, (d) kurtosis, (e) skewness and (f) variance of BE for the entire NREM and REM snore group. We used a
window size W = 6 min to generate this figure.
Fig. 8. A graphical representation of the density of occurrences of (a) mean, (b) IQR, (c) CV, (d) kurtosis, (e) skewness and (f) variance features from BEI. These were derived
from NREM (solid line) and REM (dashed line) snores of all the 12 patients in our database. We used a window size W = 6 min.
Fig. 10 demonstrates mean classification performances (Sensi- sensitivity, specificity and accuracy in Fig. 10(b) compared to the
tivity, Specificity and Accuracy) following LOOCV, with variation mean in Fig. 10(a). In particular, std of sensitivity is reduced by 10%
in Window length (W = 2–7 min) using feature set Fs1 and feature in Fig. 10(b) with the increase in W from 2 to 7 min.
set Fs2. Mean statistics were computed over 12 patients in dataset. Based on performances in Fig. 10, we selected W = 5 min with
According to Fig. 10(a) that both feature set Fs1 and Fs2 achieved feature set Fs2 as optimum point for NB model. This can achieve
similar mean performance. However, standard deviation of sensi- average 92 ± 9% sensitivity, 81 ± 9% specificity and 82 ± 7% accu-
tivity in Fig. 10(b) improved by 2% for Fs2 than Fs1 at W = 5 min. racy. While with feature set Fs1 at W = 5 min, results were 90 ± 11%
Overall, variation in W affected only standard deviation (std) of sensitivity, 82 ± 9% specificity and 83 ± 7% accuracy. Table 4
Table 3
List of significant features selected from snore episodes and its surrounding sound pattern by Kolmogorov–Smirnov test.
Group 1 5 5 5 5 5 5 5 5 5 5 5
Group 2 15 18 18 18 18 20 20 20 20 21 22
Total selected features 20 23 23 23 23 25 25 25 25 26 27
Group 1 = features derived from snore sound, Group 2 = Features derived from breathing pattern around snore sound, time window W = window designed around the snore
segment under investigation to observe surrounding sound pattern and to derive feature set.
S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142 139
1 1 1
0.8 0.8 0.8
(a ) (b ) (c)
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0 0 0
0.5 1 1.5 2 0.5 1 1.5 2 1 2 3
1 1 1
0 0 0
10 20 30 -5 0 5 0.5 1
Fig. 9. This figure demonstrates the empirical cumulative distribution function (cdf) for (a) mean, (b) IQR, (c) CV, (d) kurtosis, (e) skewness and (f) variance features of BEI
from all the REM (black) and NREM (blue) snore episodes against their proportion of occurrences in the dataset. Distributions are overlapped for window size variation from
W = 2 to 7 min to represent the effect of W on feature distribution. (For interpretation of the references to color in this figure legend, the reader is referred to the web version
of this article.)
Table 4
Classification result of REM/NREM snores from Naïve Bayes model after leave-one patient-out cross-validation at Time window W = 5 min for feature set Fs2.
Patient Specificity (%) Sensitivity (%) Accuracy (%) PPV NPV Agreement (k)
Specificity = specific to detect NREM snores, sensitivity = sensitive to detect REM snores, PPV = actual positive prediction of REM snores over total positives, NPV = actual
negative prediction of NREM snores over total negatives, k indicates Kohen’s Kappa coefficient of agreement. Agreement between snore labels from technician scored sleep
staging files and from model generated predictions was measured for a model designed for window W = 5 min.
presents details of LOOCV across 12 patients at W = 5. Sensitivity other indices such as sensitivity and specificity, which are immune
in Table 4 varied from 76 to 100% while specificity from 70 to 95%. from this imbalance. However, to obtain further detail about reli-
However, it is to be noted in Table 4 that NPV of our REM/NREM ability of the classifier model and significance of PPV and NPV
snore detection algorithm is close to 100% while PPV varies from differences, we measured score of agreement k for the optimum
20% to a maximum of 81%. Considering the corresponding confu- model. We achieved average k = 0.46 (±0.19) indicating moderate
sion matrix in Table 5, it can be observed that significant difference agreement between the two sources ((i) model generated outcome
exists in number of NREM and REM snore samples (NREM sam- of REM/NREM detection and (ii) REM/NREM label of snores from
ples are about 9 times higher than REM samples). This is an issue PSG based sleep staging) as per guidelines for k in [26].
inherent in the research question we address in this paper. Even in
normal sleep NREM–REM sleep duration ratio is about 3:1. Thus,
the NPV and PPV are affected by the imbalance, which is only natu- 3.3.2. Results from ANN classifier
ral in this problem. In order to address this situation, we computed To test whether a non-linear classifier will improve the classi-
fication performance in Section 2.5 we trained an Artificial Neural
Network. The ANN architecture consisted of a 3 layer feed-forward
Table 5 network. The number of neurons in input layer (Hip) were set equal
Total classified and non-classified REM/NREM snores from Naïve Bayes model after to the size of input feature vector derived in Section 2.4. Hence Hip
leave-one patient-out cross-validation. varied from 20 to 27 depending on number of selected features for
Predicted class of snores time window W = 2–7 min as presented in Table 3. By an exhaustive
search within the training samples, number of neurons in the hid-
NREM REM
den layer, H1 was set to 25. Number of neurons in the output layer
Actual class of NREM 21,763 5542 was, Hop = 1. Snores from REM class were labeled as ‘1’ or ‘Positive’
snores REM 320 3472
and ‘0’ or ‘Negative’ for NREM class.
140 S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142
Table 6
Overall classification result of 3-layer ANN model (Hip = 25, H1 = 25, Hop = 1) after leave-one patient-out cross-validation within the entire patient population for different
time window W.
W (min) Specificity % Sensitivity (%) Accuracy (%) PPV NPV Agreement (k)
Mean (Std) Mean (Std) Mean (Std) Mean (Std) Mean (Std) Mean (Std)
2.0 77 (±11) 80 (±21) 77 (±8) 0.34 (±0.16) 0.96 (±0.04) 0.34 (±0.13)
2.5 78 (±9) 88 (±11) 79 (±7) 0.37 (±0.14) 0.97 (±0.02) 0.40 (±0.13)
3.0 80 (±8) 87 (±15) 80 (±6) 0.38 (±0.15) 0.97 (±0.03) 0.41 (±0.11)
3.5 81 (±8) 86 (±10) 81 (±7) 0.39 (±0.14) 0.97 (±0.03) 0.43 (±0.11)
4.0 79 (±8) 91 (±12) 80 (±6) 0.39 (±0.17) 0.98 (±0.03) 0.42 (±0.13)
4.5 79 (±7) 87 (±13) 80 (±5) 0.37 (±0.16) 0.98 (±0.03) 0.40 (±0.13)
5.0 80 (±6) 92 (±9) 81 (±5) 0.40 (±0.17) 0.98 (±0.03) 0.44 (±0.14)
5.5 79 (±9) 90 (±9) 80 (±7) 0.39 (±0.16) 0.98 (±0.02) 0.43 (±0.15)
6.0 79 (±8) 91 (±9) 81 (±6) 0.39 (±0.14) 0.98 (±0.02) 0.44 (±0.14)
6.5 80 (±8) 93 (±9) 82 (±7) 0.41 (±0.17) 0.99 (±0.02) 0.47 (±0.16)
7.0 82 (±8) 93 (±7) 84 (±7) 0.43 (±0.19) 0.98 (±0.02) 0.48 (±0.17)
W = time window designed to derive snore features, specificity = specific to detect NREM snores, sensitivity = sensitive to detect REM snores, PPV = actual positive prediction
of REM snores over total positives, NPV = actual negative prediction of NREM snores over total negatives, agreement = agreement between model generated prediction of
NREM and REM snores compared to PSG study.
Table 7
Classification result of optimum ANN model (Hip = 25, H1 = 25, Hop = 1) after leave-one patient-out cross-validation within entire patient group at optimum decision threshold
Ths for time window W = 5 min and feature set Fs2.
Patient Specificity (%) Sensitivity (%) Accuracy (%) PPV NPV Agreement (k)
Specificity = specific to detect NREM snores, sensitivity = sensitive to detect REM snores, PPV = actual positive prediction of REM snores over total positives, NPV = actual
negative prediction of NREM snores over total negatives, k indicates Kohen’s Kappa coefficient of agreement. Agreement between snore labels from technician scored sleep
staging files and from model generated predictions was measured for a model designed for window W = 5 min.
100 Table 8
Total classified and non-classified REM/NREM snores from optimum ANN model
Performance % (Mean)
80 (Hip = 25, H1 = 25, Hop = 1) on patient population after leave-one patient-out cross-
Specificity (Fs1) validation for time window W = 5 min.
60 (a) Sensitivity (Fs1)
Accuracy (Fs1) Predicted class of snores
40 NREM REM
Specificity (Fs2)
Sensitivity (Fs2)
20 Actual class of NREM 21,780 5525
Accuracy (Fs2)
snores REM 434 3358
0
20
(b)
Performance % (Std )
NREM group. We extracted this information by measuring sev- sleep studies apply visual scoring rules on EEG, EOG and EMG sig-
eral time and frequency domain features of snoring episodes as nals to distinguish between these two stages.
well as self-similarity of signal energy around these episodes. In this paper, we presented that properties of snoring sound
We also computed indirect measurements such as variation of itself and its surrounding breath-to-breath variability differ with
sounds within a group of snores, changes in pseudo periodic- MSS states. We proposed a model to extract this information
ity, breath-to-breath variability, signal strength and intensities to from non-contact sound signal which can be utilized to sepa-
capture sleep stage specific pattern. We used these features to rate snores into REM and NREM classes. This approach showed
train a classifier model to separate snores into REM and NREM potential to develop both contact and non-contact snore based
classes. Classifier models were trained and validated using data technology for population screening of OSA. It requires further
from Z = 12 patients (34,148 samples of snore episodes; 4108 improvement and validation of the method on a larger clinical
REM samples and 30,040 NREM samples) following a leave one dataset.
out cross validation technique. We achieved a mean classification
sensitivity of 92% and Specificity of 81% with a Navie Bayes classi- Acknowledgements
fier.
Several studies in the past have reported about variation in This work was partially supported by the Australian Research
breathing with sleep stages [18,19]. In healthy adults [19], REM Council under grant DP120100141 to Dr Abeyratne. The authors
causes reduction in tidal volume, inspiratory flow which mim- would like to acknowledge Mr Brett Duce, Sleep Disorder Unit,
ics variable nature of ventilation. However, studies attempted Princess Alexandra Hospital, Brisbane, Australia, for the valuable
to exploit REM/NREM differences to characterize snores is very assistance with clinical data acquisition.
limited to their extent of exploration. Researchers mainly consid-
ered snoring itself as a sound during sleep and focused only on its
References
acoustic properties [16,17]. The study in [17] was limited to a pre-
sentation of descriptive statistics while study in [16] was concerned [1] W. Christine, R. Dominique, Morbidity and mortality, in: Obstructive Sleep
with intra-subject variability from known sleep states. However, Apnea: Pathophysiology, Comorbidities and Consequences, CRC Press, 2007,
snoring itself is a form of breathing embedded with detectable pp. 259–274.
[2] J. Ronald, K. Delaive, L. Roos, J. Manfreda, M.H. Kryger, Obstructive sleep apnea
pitch. As REM is well-known for breathing variability, both breath- patients use more health care resources ten years prior to diagnosis, Sleep Res.
ing pattern and acoustic properties of snore should be incorporated Online 1 (1998) 71–74.
to characterize REM/NREM snores. [3] A. Rechtschaffen, A. Kales, A Manual of Standardized Terminology, Techniques
and Scoring System for Sleep Stages of Human Subjects, Brain Information
Moreover, our research is different from studies in [16,17] both Services/Brain Research Institute, University of California, Los Angeles, CA,
in approach and methodology. We focused on generalization of the 1968.
effect of REM/NREM sleep on snores in OSA patients. We studied [4] A.S. Jordan, A. Malhotra, D.P. White, Y.L. Lo, A. Wellman, D. Eckert, S. Yim-Yeh,
M. Eikermann, S. Smith, K. Stevenson, Airway dilator muscle activity and lung
variability both in sound properties and in continuous audio recor- volume during stable breathing in obstructive sleep apnea, Sleep 32 (2009)
dings around snores which may include sounds from snoring and 361–368.
breathing. In addition, we explored the usability of REM/NREM spe- [5] J.A. Rowley, B.R. Zahn, M.A. Babcock, M.S. Badr, The effect of rapid eye move-
ment (REM) sleep on upper airway mechanics in normal human subjects, J.
cific information to identify the class of snore from entire patient
Phys. 510 (1998) 963–976.
population. [6] E.K. Sauerland, R.M. Harper, The human tongue during sleep: electromyo-
To investigate whether a non-linear pattern classifier will graphic activity of the genioglossus muscle, Exp. Neurol. 51 (1976) 160–170.
[7] S. Okabe, W. Hida, Y. Kikuchi, O. Taguchi, T. Takishima, K. Shirato, UPper airway
improve the performance of the REM and NREM separation we
muscle activity during REM and non-REM sleep of patients with obstructive
trained a multi-layer Artificial Neural Network. Our network struc- apnea, Chest 106 (1994) 767–773.
ture was consist of 3 layers, an Input layer, a Hidden layer and [8] D.W. Hudgel, R.J. Martin, B. Johnson, P. Hill, Mechanics of the respiratory system
an Output layer. The ANN structure was optimized for number of and breathing pattern during sleep in normal humans, J. Appl. Physiol. Respir.
Environ. Exerc. Physiol. 56 (1984) 133–137.
neurons in the hidden layer. No improvement in mean classifica- [9] N.A. Eiseman, M.B. Westover, J.M. Ellenbogen, M.T. Bianchi, The impact of body
tion sensitivity and specificity was seen with ANN. However overall posture and sleep stages on sleep apnea severity in adults, J. Clin. Sleep Med. 8
variation in classification performance was smaller with ANN when (2012) 655–666.
[10] T. Young, L. Evans, L. Finn, M. Palta, Estimation of the clinically diagnosed pro-
compared with Naïve Bayes. These results indicate the possibility portion of sleep apnea syndrome in middle-aged men and women, Sleep 20
that the classification problem we have at our hand is a linearly (1997) 705–706.
separable one that does not gain much by the introduction of non- [11] F. Dalmasso, R. Prota, Snoring: analysis, measurement, clinical implications and
applications, Eur. Respir. J. 9 (1996) 146–159.
linear decision boundaries. This problem, however, will be studied [12] U.R. Abeyratne, S.d. Silva, C. Hukins, B. Duce, Obstructive sleep apnea screening
in further detail in the future when we systematically compare by integrating snore feature classes, Physiol. Meas. 34 (2013) 99.
different classes of classifies. [13] U.R. Abeyratne, A.S. Wakwella, C. Hukins, Pitch jump probability meas-
ures for the analysis of snoring sounds in apnea, Physiol. Meas. 26 (2005)
A limitation to present study is that it requires snore samples
779–798.
from OSA patients to capture sleep specific patterns. We used an [14] J. Mesquita, J. Solà-Soler, J.A. Fiz, J. Morera, R. Jané, All night analysis of time
automatic algorithm to collect samples from overnight audio recor- interval between snores in subjects with sleep apnea hypopnea syndrome,
Med. Biol. Eng. Comput. 50 (2012) 373–381.
dings which showed average 89 ± 14% validation accuracy on our
[15] H. Ghaemmaghami, U.R. Abeyratne, C. Hukins, Normal probability testing of
dataset. Our method achieved mean accuracy of 82% in separat- snore signals for diagnosis of obstructive sleep apnea, in: Annual International
ing REM snores from NREM snores on a dataset of 34,148 snore Conference of the IEEE Engineering in Medicine and Biology Society, 2009, pp.
samples from 12 patients. These results indicate that the impact of 5551–5554.
[16] A. Azarbarzin, Z. Moussavi, Intra-subject variability of snoring sounds in rela-
misclassification will be insignificant if any. In the future we plan tion to body position, sleep stage, and blood oxygen level, Med. Biol. Eng.
to investigate the effect of snore detection accuracy on REM/NREM Comput. 51 (2012) 429–439.
classification performance, with larger dataset. [17] H. Nanako, T. Ikeda, M. Hayashi, E. Ohshima, A. Onizuka, Effects of body position
on snoring in apneic and nonapneic snorers, Sleep 26 (2003) 169–172.
[18] E. Aserinsky, N. Kleitman, Regularly occurring periods of eye motility, and
concomitant phenomena, during sleep, Science (New York, N.Y.) 118 (1953)
5. Conclusion 273–274.
[19] N.J. Douglas, D.P. White, C.K. Pickett, J.V. Weil, C.W. Zwillich, Respiration during
sleep in normal man, Thorax 37 (1982) 840–844.
REM and NREM sleep are two different neurophysiologic states [20] G.G. Haddad, H.J. Jeng, T.L. Lai, R.B. Mellins, Determination of sleep state in
with distinct breathing pattern and ventilatory control. Clinical infants using respiratory variability, Pediatr. Res. 21 (1987) 556–562.
142 S. Akhter et al. / Biomedical Signal Processing and Control 25 (2016) 130–142
[21] P.I. Terrill, S.J. Wilson, S. Suresh, D. Cooper, C. Dakin, Application of recurrence [24] J. Markel, Digital inverse filtering—a new tool for formant trajectory estimation,
quantification analysis to automatically estimate infant sleep states using a IEEE Trans. Audio Electroacoust. 20 (1972) 129–137.
single channel of respiratory data, Med. Biol. Eng. Comput. 50 (2012) 851–865. [25] M. Sondhi, New methods of pitch extraction, IEEE Trans. Audio Electroacoust.
[22] R.L. Horner, Impact of brainstem sleep mechanisms on pharyngeal motor con- 16 (1968) 262–266.
trol, Respir. Physiol. 119 (2000) 113–121. [26] J.R. Landis, G.G. Koch, The measurement of observer agreement for categorical
[23] G. Bertino, E. Matti, S. Migliazzi, F. Pagella, C. Tinelli, M. Benazzo, Acoustic data, Biometrics 33 (1977) 159–174.
changes in voice after surgery for snoring: preliminary results, Acta Otorhi-
nolaryngol. Ital. 26 (2006) 110–114.