You are on page 1of 4

Multiresolution Representations and Data Mining

of Neural Spikes for Brain-Machine Interfaces


Sung-Phil Kim1, Jose M. Carmena2,3, Miguel A. Nicolelis2,3,4, Jose C. Principe1
Computational NeuroEngineering Lab1, University of Florida, Gainesville, FL 32611
Dept. of Neurobiology2, Center for Neuroengineering3, Department of Biomedical Engineering4
Duke University, Durham, NC-27710

Absract-In brain-machine interface (BMI) applications, the number of regularization techniques used in data mining,
neural firing activities have been represented by spike counts the method based on L1-norm penalty will be more suitable
with a fixed-width time bin. Adaptive models have been designed since it is able to select input variables. It will also enable us
to utilize these bin counts for mapping the associated behavior to understand the structure of the neural activity by selecting
which is typically 2D or 3D hand movement. However, the
more correlated neurons with the associated behavior.
representation of the firing activities can be enriched by binning
neural spikes with multiple time scales based on multiresolution In this paper, we present the multiresolution analysis of
analysis. This multiresolution representation of neural activities neural spikes and the application of modeling on those
can provide more accurate prediction of the hand movement representations using the L1-norm based regularization
parameters. Data mining techniques must be applied to models method.
using multiresolution representation in order to avoid overfitting.
In this paper, we demonstrate that the multiresolution II. MULTIRESOLUTION ANALYSIS OF NEURAL SPIKES
representation improves the performance of the linear model for
BMIs compared to the model with the fixed-width time bin. The multiresolution representation of a neural spike train
can be performed via the wavelet transform. To facilitate the
I. INTRODUCTION computation, we apply the discrete wavelet transform with the
dyadic Haar wavelets. This dyadic Haar wavelet is basically
The main function of the signal processing module in brain-
utilized in the à trous wavelet transform which can be
machine interfaces (BMIs) is to learn mappings between
implemented very effectively in hardware [9]. The smallest
neural activity patterns and motor parameters. To represent the
scale should be larger than 1ms because of the refractory
neural activity, most of experimental designs utilize the
period of a neuron. The largest scale should not exceed 1sec
estimate of local mean firing rate of neurons. The local mean
since it has been reported that the neural activity up to 1sec is
firing rate has been estimated by binning neural spikes with a
correlated with the associated behavior [1]. In our model, we
non-overlapping sliding time window of the length ranging
select eight scales starting at 5ms1 up to 640ms with dyadic
from 50ms up to 100ms [1-5]. Those representations of the
scaling: 5, 10, 20, 40, 80, 160, 320, and 640ms.
firing rate have been used for modeling of the relationship
With a set of scales, the wavelet transform is computed on
with responsive motor parameters. Adaptive models
each neuronal spike train. The Haar wavelet transform can be
(including the simple Wiener filter [6]) based on this estimate
regarded as multi-scale binning: at any given time, neural
have predicted motor parameters with a correlation coefficient
spikes are binned with different time window widths. The
between 0.6 ~ 0.8. It has also been shown that all the proposed
multi-scale binning process is repeated at each time instance at
models reached the same basic performance level for a target
a sampling rate of 200Hz, therefore time windows are sliding
reaching task, which may not be sufficient for more involved
with overlaps except the 5ms bin. In that case, the larger scale
real applications [7]. This led us to explore different
with wider overlaps represents a smoother temporal firing
representations of neural activity to help improve the accuracy
pattern. This procedure is illustrated in Fig. 1.
of the models. One possible way is to estimate the firing rate
Fig. 2 demonstrates an example of the wavelet transform
with multi-scale binning, which can be referred as the
coefficients (or multi-scale binned data) for some particular
multiresolution analysis of neural spike trains [8]. In this
neuron. The coefficients (for 5sec-long data) are presented
analysis, the short-time neural activity as well as the large-
along with the associated hand movement trajectories in the
scale activity can be represented. This multi-scale binning
bottom panel. In order to present the association between
possibly discloses information that may not be available with
wavelet coefficients and the movement, the temporal pattern
fixed-width binning.
of the wavelet coefficients for each scale is plotted on top of
With the multiresolution representation of neural spikes,
the hand position trajectory (x-coordinate) in fig. 3. The figure
however, the dimensionality of inputs to the mapping model
demonstrates that temporal patterns with larger scales seem to
will be considerably increased. Also, input variables are likely
to be collinear, thus increasing the condition number of the
input covariance matrix. Therefore, regularization methods
1
seem very appropriate to avoid poor generalization. Among The minimum scale (5ms) was empirically determined for which binning
yielded significantly different time series from raw spike trains.
t0

5ms 0
10ms 0

20ms 1
40ms 2
80ms 4
160ms 6
320ms 12

640ms 21

Fig. 1. An illustration of the multiresolution representation of a spike train. At


a given time instance t0, the spikes are binned with six different widths. The
number in each bin denotes spike counts.

Fig. 3. The temporal trajectory of neural spike counts with multi-scale binning
along with the x-coordinate of the hand position. The solid line depicts the
wavelet coefficients while the dotted line does the trajectory. Note that each
trajectory is normalized to fit in the dynamic range of hand trajectory for the
visual purpose.

hundred neurons and eight scales, the input dimension of a


model reaches 1000. This fact leads us to employ data mining
to select the both neurons and resolutions for model fitting.
One of popular data mining techniques is to add a Lp-norm
penalty to the mean squared error (MSE) cost function such
as,
2 p
J ( w) = E[ e ] + λ ∑ wi (1)
i

where e denotes the error vector, λ is the regularization


parameter, and wi is a model parameter. p is typically set to 1
or 2, where p=2 leads to ridge regression [10] in statistics or
weight decay [11] in neural networks, and p=1 leads the
LASSO (Least Absolute Shrinkage and Selection Operator)
Fig. 2. The wavelet transforms of a neuronal spike train. The top plot presents
[12] in statistics. In a Bayesian framework, p=2 corresponds
the wavelet coefficients with 8 scales: 5ms ~ 640ms. The darker pixels to the Gaussian prior in parameter space, and p=1 corresponds
represent larger magnitudes of coefficients. The bottom plot shows the to the Laplacian prior [13]. It has been shown that the L1-norm
corresponding x-(solid), y-(dotted) coordinates of the hand position (top) and penalty generates sparse models, and thus provides better
velocity (bottom).
generalization. The L1-norm penalty also enables the selection
of variables, which is useful for the analysis of data. Hence,
be more correlated with hand trajectories than smaller scales. we choose to regularize models with the L1-norm penalty in
the multiresolution representation.
III. DATA MINING OF MULTIRESOLUTION REPRESENTATION The LASSO can be implemented by the least angle
The multiresolution representation of the firing activity regression (LAR) algorithm which has been recently proposed
from a large number of neurons leads to difficulties in model by Efron et. al. [14]. The LAR procedure provides the
training due to; 1) a very large model (degrees of freedom), solution of the minimization problem of the cost function (1)
and 2) colinearity between inputs. For instance, with over one with much less computational complexity than LASSO.
The LAR procedure starts with all zero coefficients. The
input variable having the most correlation with the desired TABLE I
DISTRIBUTION OF NEURONS IN CORTICAL AREA
response is selected. We proceed in the direction of the
selected variable with a step size which is determined such PMd M1 S1 SMA M1_ipsilateral
that some other input variable achieves as much correlation 1~66 67~123 124~161 162~180 181~185
with the current residual as the first input. Then, we move in a
( 66) (57) (38) (19) (5)
the equiangular direction between these two inputs until the a
The number of neurons in each area
third input has the same correlation. This procedure is
repeated until either all input variables join the selection, or (64000x1) with the linear combination of X such as,
the sum of parameters meets a preset threshold. In this way, d = dˆ + e = Xw + e (2)
the maximum correlation between inputs and the residual
where w is the model weight vector, and e is the error vector.
decreases over each selection step.
The desired responses were normalized to have zero-mean.
The weight vector w was learned by the LAR algorithm. To
IV. BRAIN-MACHINE INTERFACES DESIGN
determine the L1-norm constraint, we utilized the hold-out
In this section, we present the BMIs design based on the cross-validation [15]. We held the last 10% of the training
multiresolution representation of neural spikes and the linear data for the validation set. The threshold was determined by
model learned by the LAR algorithm. We will demonstrate the minimizing the MSE for the validation set. The LAR
analysis of neural activities in BMIs based on variable algorithm stopped learning when the L1-norm of weights
selection. Then, the generalization performance of prediction reached this threshold.
models using the multiresolution representation will be The LAR algorithm selected a different subset of input
compared with the model with fixed-width binning. variables for each desired response (there were four responses
including x-, and y- coordinates of hand position and
Data Descriptions
velocity). The cortical distributions of the number of selected
The neural recordings were collected in Dr. Nicolelis
neurons are summarized in table II. Fig. 4 describes the
primate laboratory at Duke University. Microwire electrode
selection results for each desired response. The black pixels
arrays [5] were chronically implanted in the dorsal premotor
denote the selected variables aligned in neuronal space (x-
cortex (PMd), supplementary motor area (SMA), primary
axis) with the scales in y-axis. These graphs show that LAR
motor cortex (M1, both hemispheres) and primary
tends to select larger scales since the temporal trajectory of
somatosensory cortex (S1) of an adult female monkey
larger scales exhibited more correlation with movement as
(Macaca mulatta) while performing a target hitting task. The
shown in fig. 3.
task involved the presentation of a randomly placed target on
a computer monitor in front of the monkey. The monkey used Model Comparison with Fixed-Width Binning
a hand-held manipulandum (joystick) to move the computer We designed two linear models with different input data. In
cursor so that it intersects the target. While the monkey the first linear model, a fixed-width time window of length
performed the motor task, the hand position was recorded at 80ms was used for binning. Then, the binned data for each
50Hz in real time along with the corresponding neural activity neuron were embedded by time delay lines consisting of five
from multiple channels [5]. Then it was upsampled to 200Hz delays (six taps from the delay line). These binning and
using cubic spline interpolation. After spike-detection and embedding procedures yielded 185×6=1110 input variables.
sorting [5], a total 185 unit/multiunit were extracted in the The same delay line (5 delays) was applied to each scale of
particular recording session. The distribution of these neurons multiresolution data in the second model, yielding eight times
over cortical areas is summarized in table I (numerical indices as many input variables (185×8×6=8880) as the first model.
were labeled for neuron identification purpose). 320-sec The LAR algorithm was used for both models to estimate
dataset was used for training and consecutive 240-sec dataset weights for 2D hand position and velocity. The number of
was for testing. nonzero weights after training is listed in table III. It is
noteworthy that the weights for the multiresolution data were
Multiresolution Based Modeling and Analyses
more pruned compared to the case of single resolution.
Each neural spikes were binned with multiple scales from
5ms~640ms. The binned data with each scale was normalized TABLE II
to have zero-mean and maximum magnitude of 1 such that a THE NUMBER OF SELECTED NEURONS FROM EACH AREA
model can avoid biasing to the larger scale inputs. 185
neurons with 8 scales yielded the input dimension of 1480. PMd M1 S1 SMA M1_ips.
The multiresolution binning procedure for the 320-sec training Position-X a
18 ( 27%) 27 (47%) 9 (24%) 7 (37%) 2 (40%)
dataset generated an input data matrix X (64000x1480), where Position-Y 20 (30%) 26 (39%) 15 (39%) 7 (37%) 0 (0%)
each row represents the input vector at a given time instance. Velocity-X 50 (76%) 46 (81%) 30 (79%) 15 (79%) 5 (100%)
Then, a linear model was built to predict the desired response
Velocity-Y 40 (61%) 42 (74%) 30 (79%) 13 (79%) 3 (60%)
(x-, or y-coordinates of hand position or velocity) vector d a
The ratio of the number of selected neurons.
TABLE III V. CONCLUSIONS
THE NUMBER OF NONZERO PARAMETERS
In this paper, we have introduced a multiresolution
Position-X Position-Y Velocity-X Velocity-Y representation of neural firing activities in BMIs based on
Single-scale 568 (a51.1%) 675 (60.8%) 359 (32.3%) 379 (34.1%) discrete wavelet transform. This representation builds a richer
Multi-scale 344 (3.9%) 449 (5.1%) 379 (4.3%) 580 (6.5%) input feature space for the linear prediction model when
a
The ratio of the number of nonzero parameters to the total. compared with the traditional fixed-width time window
binning. The empirical results have shown that multiresolution
based linear model slightly improved the generalization
performance compared to the single resolution based model.
We have also proposed to use a data mining technique in
order to reduce the dimension of the multiresolution input
space that linearly increases with the number of scales. The
LAR algorithm, which can select variables using the L1-norm
penalty, could effectively regularize the linear model.
Furthermore, this variable selection technique has enabled
interesting analysis of neural activity related to behavior.
Further studies will pursue this topic.
ACKNOWLEDGEMENT
This work was supported by a DARPA sponsored grant #
ONR-450595112 to MALN and the CRPF to JMC.

REFERENCES
Fig. 4. The distribution of the selected input variables for (a) x-coordinate, (b) [1] J. Wessberg, et. al., “Real-time prediction of hand trajectory by
and y-coordinate of position, and (c) x-coordinate, and (d) y-coordinate of ensembles of cortical neurons in primates,” Nature, vol. 408, pp. 361-
velocity. Black pixels denote the selected inputs. Neuron indices are aligned in 365, 2000.
the x-axis, and scales are in the y-axis. [2] Y. Gao, M.J. Black, E. Bienenstock, W. Wu, J.P. Donoghue, “A
quantitative comparison of linear and non-linear models of motor
cortical activity for the encoding and decoding of arm motions,” IEEE
We used two performance measures including the EMBS CNE, Capri, Italy, 2003.
correlation coefficients (CC) and the normalized MSE. The [3] A.B. Schwartz, D.M. Taylor, S.I.H. Tillery, “Extraction algorithms for
CC varies within [-1 1] where 0 indicates no correlation and 1 cortical control of arm prosthetics,” Current Opinion in Neurobiology,
vol. 11, pp. 701-708, 2001.
indicates the perfect correlation between the desired response [4] M.D. Serruya, N.G. Hatsopoulos, L. Paninski, M.R. Fellows, J.P.
and the model output. The normalized MSE was computed as Donoghue, “Brain-machine interface: Instant neural control of a
dividing MSE by the desired signal power. These measures movement signal,” Nature, vol. 416, pp. 141-142, 2002.
were evaluated for the test dataset to assess generalization [5] J.M. Carmena, et. al., “Learning to Control a Brain-Machine Interface
for Reaching and Grasping by Primates,” PLoS Biology, vol. 1, pp. 193-
performance. This analysis resulted in superior performance 208, 2003.
of the linear model with multiresolution data compared to the [6] S. Haykin, Adaptive filter theory. Upper Saddle River, NJ: Prentice Hall,
one with fixed-width binned data, as summarized in table IV. 1996.
To assess the statistical difference of performances between [7] S.P. Kim, et. al., “A comparison of optimal MIMO linear and nonlinear
models for brain-machine interfaces,” unpublished.
two models, we performed the one-tail t-test based on MSE as [8] F. Murtagh, J.L. Starck, O. Renaud, “On neuro-wavelet modeling,”
proposed in [7]. The null hypothesis of no difference between Decision Support Systems Journal, vol. 37, pp. 475-484. 2004.
model performance was rejected at significance level of [9] M.J. Shensa, “Discrete wavelet transfoms: Wedding the à trous and
0.05/0.01 (p<0.001) for both position and velocity, Mallat algorithms,” IEEE Trans. on Sig. Proc., vol. 40, pp. 2464-2482,
1992.
statistically proving a superior performance of the [10] A.E. Hoerl and R.W. Kennard, “Ridge regression: Biased estimation for
multiresolution model. nonorthogonal problems,” Technometrics, vol. 12, pp. 55-67, 1970.
[11] A. Krogh and J.A. Hertz, “A simple weight decay can improve
TABLE IV generalization,” In Advances in NIPS IV, San Mateo, CA, pp. 950-957,
PERFORMANCE COMPARISON 1995.
[12] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J.
Single Resolution Multiresolution Royal Stat. Soc. B, vol. 58, pp. 267-288, 1996.
Measures CC MSE CC MSE [13] M.A.T. Figueiredo, “Adaptive sparseness for supervised learning,” IEEE
Trans. on Pattern Analysis and Mach. Intell., vol. 25, 2003.
Position-X 0.712 0.474 0.755 0.409 [14] B. Efron, I. Johnstone, T. Hastie, and R. Tibshirani, “Least Angle
Position-Y 0.699 0.636 0.734 0.585 Regression,” Annals. of Statistics, in press.
Velocity-X 0.697 0.529 0.715 0.499 [15] C.M. Bishop, Neural networks for pattern recognition. Oxford, UK:
Oxford University Press, 1995.
Velocity-Y 0.763 0.419 0.779 0.396

You might also like