You are on page 1of 22

Hypertools: the toolbox for spectral image analysis

version 2.2, November 2005


Pavel Paclik, Serguei Verzakov, and Robert P.W.Duin

Contents
1 Hypertools toolbox 1

2 Installing hypertools 2

3 Data handling 2
3.1 Data handling in hypertools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Spectral images in PRTools dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.3 Conversion between DIP image and dataset representation . . . . . . . . . . . . . . . . . . 4
3.4 Importing binary BioRad FTIR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Visualization techniques for spectral images 5


4.1 Plotting spectral data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Interactive vizualization tool for spectral images . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Area under spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Preprocessing 9
5.1 Baseline subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Smoothing of spectral data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3 Unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6 Dissimilarity measures 12
6.1 Dissimilarity measures implemented in hypertools . . . . . . . . . . . . . . . . . . . . . . . 12
6.2 Visualization using dissimilarity measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3 Building dissimilarity representation for pattern recognition . . . . . . . . . . . . . . . . . 13

7 Feature extraction methods 14


7.1 Generalized Local Discriminant Bases (GLDB) . . . . . . . . . . . . . . . . . . . . . . . . 14
7.2 Multi-class GLDB feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
7.3 Genetic algorithm for feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.4 Maximum Autocorrelation Transformation (MAF) . . . . . . . . . . . . . . . . . . . . . . 18
7.5 Principial Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.6 PCA shaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.7 Canonical Correlation Analysis (CCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.8 Partial Least Squares (PLS) regression mapping . . . . . . . . . . . . . . . . . . . . . . . 20

8 Image segmentation 20
8.1 Segmentation combining spatial and spectral domain . . . . . . . . . . . . . . . . . . . . . 20
8.2 ECHO segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

9 Classifiers 22

1 Hypertools toolbox
Hypertools is A Matlab toolbox for analysis of hyperspectral images. It contains algorithms for visualiza-
tion, preprocessing, representation and classification of spectral data. This toolbox is being developed at
TU Delft in The Netherlands within the Hyperspectral Image Analysis project, sponsored by the Dutch
technology foundation STW. Hypertools is available under academic or commercial license. Although we
are especially targeting spectral images, number of routines may be used also for generic spectral datasets

1
without spectral. It is based on the PRTools toolbox version 4. This document briefly describes how to
use hypertools for analyzing of spectral images.

2 Installing hypertools
Hypertools require Matlab version 6.1 or higher and PRTools version 4.x. Many routines is hypertools
require also DIPimage version 1.4 or higher. In order to install hypertools, extract the archive into a
directory and add its path into the Matlab environment.

3 Data handling
3.1 Data handling in hypertools
Hyperspectral images consist of spectral measurements organized in a spatial setup:

Figure: Hyperspectral image cube

In hypertools, image cubes may be stored either in PRTools dataset or in a dip_image. It depends
on actual type of processing we want to execute on the data cube which data representation is better
at the moment. In case of extensive filtering or 2D-connected processing, dip_image is a good choice.
For pattern recognition tasks, such as clustering, image segmentation, feature extraction or classification,
dataset representation offers far more flexibility. Because of availability of additional meta information,
hypertools uses PRTools dataset as the primary data representation.
In this tutorial, we illustrate various data analysis approaches on a spectral image from a plastic sorting
application (NIR spectra). The image depicts four types of plastics and a background class. Through
the text, the image a1 is used for training and a2 as a test set.

3.2 Spectral images in PRTools dataset


Spectral images, stored in PRTools dataset contain following meta information in structure fields:
spectral image dimensions in objsize and number of spectral wavelengths in featsize

2
>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> a1.objsize

ans =

33 40

% the dataset represents an image 33 x 40 pixels ...

>> a1.featsize

ans =

240

% ... with 240 spectral wavelengths

optional labels per spectrum in nlab. Labelling of spectral images may be visualized as an image
using
getli (get label image) function:

>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]
>> getli(a1)
Displayed in figure 3

Figure: Label image

unique identifier per spectrum (pixel) in ident. Identifiers are useful to back trace what pixels are
present in a subset of the original spectral image.

>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]

% select the data subset subset taking one per cent of


% pixels per class
>> proto=gendat(a1,0.01)
15 by 240 dataset with 5 classes: [4 4 3 2 2]

% overlay the image of 100th wavelength highlighting the prototypes


>> drawident(a1(:,100),proto)
Displayed in figure 3

3
Figure: Highlighting the prototypes

version info capturing the PRTools version, used for dataset generation and date of dataset cre-
ation. Dataset name may closely identify the dataset content or project name:

>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]

>> a1.version1

ans =

Name: Pattern Recognition Tools


Version: 3.2.5
Release:
Date: 04-Apr-2003

>> a1.version2

ans =

22-Sep-2003 15:33:40

>> a1.name=projectA
projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

additional info in the user structure. Additional information, such as spectral range, units, or
spectral imaging technique description, may be stored in the user field.

3.3 Conversion between DIP image and dataset representation


In hypertools, spectral image data may be transformed back and forth between PRTools datasets and
dip_image objects using data2dip and dip2data
functions. Both transformations preserve data connectivity, but transforming the data into DIPimage,
the meta information, stored in a dataset, is lost.
todo:examples of data2dip, dip2data and fig2dip

3.4 Importing binary BioRad FTIR data


Binary BioRad files may be imported as spectral datasets using the ftir_load routine. This routine
is experimental and limited to transmission BioRad datasets. Starting wavelength and the step may be
defined. This meta information is stored in the dataset user field and used e.g. in rendering the spectral
domain plots via plots .

4
>> fim=ftir_load(a1.dat,910.399010, 15.43049)
4096 by 512 dataset with 0 classes: []
>> fim.user

ans =

type: spectra
format: FTIR converted from bio-rad
mode: transmission
units: cm^-1
start_wavelength: 910.3990
step: 15.4305

>> plots(gendat(fim,10))

Figure: Plotting spectra using plots

Note the reversed x-axis and units used.

4 Visualization techniques for spectral images


4.1 Plotting spectral data
Spectral data, stored in a PRTools dataset may be plotted using the plots function. Dataset features
are assumed to represent densely sampled wavelengths. Each data sample is rendered as a 1D function
of the wavelength. See also this example.

4.2 Interactive vizualization tool for spectral images


Hypertools contain a simple interactive visualization tool for spectral imagery. It allows the user to
inspect both spectral and spatial data domain simultaneously. Using the
showsi command, the spectral image is rendered in two windows: the spatial view (using dip_image))
and the spectral plot using plots . It is useful to store the handle, returned from showsi command as
it enables access to the visualized data and provides additional functionality.

>> a1
projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]
>> sih=showsi(a1);

5
Figure: Visualizing the spectral image using showsi

Moving a mouse pointer over the spatial image, a spectrum at the current point is visible in the spectral
plot. When a left mouse button is clicked in the spectral plot, a spectral wavelength may be selected by
dragging the mouse over the plot. The spatial plot is updated accordingly.

Figure: Visualizing different spectral bands

By clicking over the spatial image, points may be selected. Corresponding spectra are also plotted in
the spectral pane. Three buttons in the spectral figure allow to choose three different colors or classes.
Right mouse click in the spatial window cancels the point selection.

6
Figure: Highlighting the points of interest
Selected points may be retrieved from the spectral image view using the spectral image handle sih
and used e.g. as prototypes. A dissimilarity may be computed from all the data points to the selected
prototypes:

>> proto=si_get_spectra(sih)
5 by 240 dataset with 3 classes: [2 2 1]

% we have retrieved five spectra that are assigned to three classes

>> D=dasam(a,proto)
5280 by 5 dataset with 5 classes: [1461 1426 1025 664 704]

% Spectral Angle Mapper dissimilarity to the selected five prototypes was


% computed and returned as a dataset

% The distance dataset may be visualized as a feature space using a PRTools


% scatterdui command:

>> fig=scatterdui(D);

% fig is a figure handle of the scatter plot which we will use later.

7
Figure: Scatter plot of a dissimilarity space representation

Buttons along each scatter axis allow us to step easily between different feature space dimensions. By
clicking at the data points, sample index is shown nearby. This enables us to find back a sample or a
pixel from a feature space.
In order to see the correspondence between different representations of spectral data, scatter plots may
be attached to the spectral image. We need the handles of both spectral image (sih) and of the scatter
plot (fig):

>> si_attach_display(sih,fig)

Now, we may move the mouse over the spatial image and observe the corresponding data sample in
the feature space, denoted by the yellow circle.

Figure: Spectral image with an attached scatter plot window

4.3 Area under spectra


todo:example of generating a area under the spectrum image

8
5 Preprocessing
5.1 Baseline subtraction
Baseline may be subtracted from a dataset using a
basesubm mapping. First, a single spectrum must be identified which will be used for baseline correc-
tion and then the baseline region must be identified. This may be achieved using an interactive showsi
tool, included in hypertools:

>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]

% display the spectral image:


>> sih=showsi(a1)

% interactively selected point is retrieved by:


>> b=si_get_spectra(sih)
1 by 240 dataset with 1 class: [1]

Figure: Interactively selecting a spectrum for baseline subtraction

% we manually identified the wavelengths in a baseline region 1:40 220:240


>> w=basesubm(b,[1:40 220:240])
Baseline subtraction mapping, 240 to 179 trained mapping --> basesubm

% apply the baseline subtraction to the dataset...


>> c=a1*w
1320 by 179 dataset with 5 classes: [372 360 258 161 169]

% ...and show the result


>> sih2=showsi(c)

9
Figure: Spectral image with subtracted baseline

The baseline subtraction mapping reduced the dimensionality of our dataset from original 240 to 179
wavelengths. Please note the zero values in the tails of corrected spectra resulting from the default
clipping.

5.2 Smoothing of spectral data


Via smoothm mapping, two smoothing algorithms are implemented in Hypertools: Gaussian and Savitsky-
Golay. Smoothing parameter sigma may be set for Gaussian smoothing. In case of Savitsky-Golay, three
parameters may be set: windows size, polynomial degree and the derivative order.

% smoothm without parameters is Gaussian smoothing with sigma=1.0 by default


>> w=smoothm
Spectral smoothing mapping (Gaussian sigma=1.0), fixed mapping --> smoothm

% choosing different sigma:


>> w=smoothm(gauss,3)
Spectral smoothing mapping (Gaussian sigma=3.0), fixed mapping --> smoothm

% default Savitsky-Golay smoothing:


>> w=smoothm(sg)
Spectral smoothing mapping (Savitsky-Golay ws=3,p=1,d=0), fixed mapping --> smoothm

>> w=smoothm(sg,7,2)
Spectral smoothing mapping (Savitsky-Golay ws=7,p=2,d=0), fixed mapping --> smoothm

>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]
>> b=gendat(a1,1)
1 by 240 dataset with 5 classes: [0 1 0 0 0]

>> plots(b)

% smoothing
>> w=smoothm(sg,7,2)
Spectral smoothing mapping (Savitsky-Golay ws=7,p=2,d=0), fixed mapping --> smoothm

% plot the smoothed spectrum:


>> plots(b*w,r)

10
Figure: Smoothing spectra. Original spectrum (blue) and the smoothed spectrum (red)

5.3 Unmixing
Other names for unmixing are blind source separation (in signal processing), multy-curve resolution (in
chemometrics) or factor analysis. The goal of unmixing is to represent dataset as a product of two
matrices: concentrations (scores) and spectra (loadings) of pure components. For spectral data we can
make use of the nonegativity of both matrices (scores and loadings). Unmixing routines which are
implemented in hypertools:
varimax : VARIMAX, given the loadings found by PCA, tries to find the rotation after which they
look as sparse as possible, i.e. it assumes that pure spectra consist of the number of compact peaks.
Varimax is provided in two versions,
varimaxfm performing feature selection and
varimaxom implementing object selection.
opa : OPA (orthogonal projection approach) looks for the set of the most dissimilar (orthogonal)
spectra
opam implements selection of most orthogonal features
opaps selection of the most orthogonal prototypes (examples).
simplisma : is similar to the OPA but also takes into account the purity of the candidate spectra
(the pure spectrum is supposed to have large variance)
simplismam implements selection of most pure features
simplismaps selection of the most pure prototypes (examples).
als : (alterating least squares) is the last step in unmixing procedure. Taking as an input data
and candidate loadings (pure spectra) found by previous routines, it decomposes data into positive
concentration and spectra matrices.
todo:unmixing examples
>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]

% find pure spectra:


>> b=simplismaps(a1)
5 by 240 dataset with 5 classes: [1 1 2 0 1]

% compute concentrations using the pure spectra


>> [c,b2]=als(a1,b)
1320 by 5 dataset with 5 classes: [372 360 258 161 169]

11
>> xOPAOptions.maxsn = inf;
>> OPAOptions.eps = 0.05;
>> OPAOptions.include_mean = 0;
>> OPAOptions.verbose = 0;
>> [Y, Y_ind, dis_max, dis] = opa(X,OPAOptions);
>> ALSOptions.mode = samples; % samples | features
>> ALSOptions.maxiter = 100;
>> ALSOptions.crit = rec; % conv | rec
>> ALSOptions.eps = 1e-2;
>> ALSOptions.verbose = 0;
>> [Zp,Yp] = als(X,Y,ALSOptions);

6 Dissimilarity measures
Dissimilarity measures define the scalar-valued measurement of a dissimilarity between two spectra. The
dissimilarity values may be used for visualization or for building dissimilarity representation for pattern
recognition.

6.1 Dissimilarity measures implemented in hypertools


dasam : Spectral Angle Mapper (SAM), (arc cosine)
dsam : Spectral Angle Mapper (as normalized inner product)
dkolmogorov : Kolmogorov dissimilarity between cumulative distributions, computed from unit-
normalized spectra
dmatch : matching dissimilarity (sum of differences between cumulative distributions, computed
from unit-normalized spectra)
dspec_shape : L1 norm between derivative of spectra (using smoothed Gaussian derivative filter)
dquadform : quadratic form dissimilarity

12
6.2 Visualization using dissimilarity measures
Dissimilarities may be computed from a complete spectral image to a set of prototype spectra. In the
following example, we choose randomly a set of prototypes from a labelled spectral image and compute
Spectral Angle Mapper dissimilarity between all image spectral and these prototypes. The result of
dissimilarity computation is a dataset with 15 features (each measuring dissimilarity to a corresponding
prototype). Because the dataset originates from a spectral image, it may be converted into dip_image
object and visualized.

>> a1
projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]
>> proto=gendat(a1,0.01)
projectA, 15 by 240 dataset with 5 classes: [4 4 3 2 2]

% computing dissimilarities from all image spectra to selected 15 prototypes


>> D=dasam(a1,proto)
1320 by 15 dataset with 5 classes: [372 360 258 161 169]

% visualizing the dissimilarity dataset as an image:


>> data2dip(D)
Displayed in figure 1

The image contains 15 bands, (features in a dataset) rendering dissimilarities to the 15 prototypes.
By pressing n or p keys, we can move back and forth (dip_image feature). The following Figure shows
the dissimilarity to the 5th prototype (indices in DIP image package are zero-based).

6.3 Building dissimilarity representation for pattern recognition


Dissimilarity measures may be used to create a representation and use it for building classifier. Tradi-
tionally, mean class spectra are used as prototypes, some spectra-specific dissimilarity is computed to
these prototypes, followed by the minimum distance classifier ( mindistc ).

13
% compute mean class spectra - use them as class prototypes
>> m=meancov(a1)
5 by 240 dataset with 5 classes: [1 1 1 1 1]

% compute the training dataset with dissimilarities to the prototypes


>> dtr=dasam(a1,m)
1320 by 5 dataset with 5 classes: [372 360 258 161 169]

% compute the test dataset with dissimilarities to the training prototypes


>> dts=dasam(a2,m)
1320 by 5 dataset with 5 classes: [359 353 254 170 184]

% train minimum distance classifier


>> w=mindistc(dtr)
Minimum distance classifier, 5 to 5 trained mapping --> mindistc

% execute the trained mapping on the test set and get the average class error:
>> dts*w*testc
ans =
0.0992

% get the image with labels and display it as dip_image


>> lab=dts*w*classim;
>> getli(lab)
Displayed in figure 1

Prototypes (representation set) may be also selected randomly or via the interactive tool, as shown
above.

7 Feature extraction methods


7.1 Generalized Local Discriminant Bases (GLDB)
Generalized Local Discriminant Bases (GLDB) feature extraction algorithm, proposed by Kumar,Ghosh,
and Crawford:
Kumar,Ghosh,Crawford:Best-Bases Feature Extraction Algorithms for Classification of Hyperspec-
tral Data, IEEE Trans.on Geoscience and Remote Sensing, bol.39, no.7, July 2001
GLDB algorithm splits a spectrum into a set of non-overlapping regions maximizing the separability
between classes. GLDB algorithm starts from with all the wavelengths forming singleton feature groups.
It tries to grow each group and selects the one maximizing the criterion based on Fisher ratio and
minimum correlation (max-min). It grows until no further improvement can be made. GLDB algorithm,

14
applied to a training dataset, results in a set of non-overlapping wavelength groups and corresponding
Fisher projection mappings. Applied to a test set, a new feature space is generated by projecting each
wavelength group into the 1D Fisher space. In order to mitigate the influence of non-informative data, the
authors recommend to run a subsequent feature selection or extraction algorithm after applying GLDB
method.
gldbm routine implements the bottom-up two-class GLDB feature extractor as proposed by Kumar
et.al.. Selected wavelength groups may be visualized by
plot_gldb_groups function.

>> load plastic-trts.mat


>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]

% we select the 2nd and 3rd classes...


>> b1=seldat(a1,[2 3])
projectA, 618 by 240 dataset with 2 classes: [360 258]

% ... and run GLDB algorithm


>> w=gldbm(b1)
.......Best Bases mapping, 240 to 50 trained mapping --> gldbm

% projecting the dataset by the trained GLDB mapping:


>> c1=b1*w
projectA, 618 by 50 dataset with 2 classes: [360 258]

% plotting identified groups of bands:


>> plot_gldb_groups(b1,w)

Figure: Groups of wavelengths, selected by the two-class GLDB feature extraction

As you can see, the GLDB feature extraction algorithm decomposes a complete spectral range into
a set of wavelength groups. Many of these are probably not adding any discriminatory information in
the classification problem at hand. In order to identify the informative groups, a second stage feature
selection may be carried on.
In the following example, we select the best features generated by the GLDB extraction using a
sequential forward selection procedure. Because GLDB is effectively both feature selection (groups of
adjacent wavelengths) and feature extraction (within each group), the feature selection result may be
combined with the trained GLDB mapping leading to a reduced GLDB mapping:

15
% select the best subset of GLDB features in the dataset c
>> wfsel=featself_simple(c1)
Forward Feature Selection, 50 to 1 fixed mapping --> cmapm

% let us now derive a reduced GLDB mapping retaining only wavelength groups,
% selected by the feature selection
>> wnew=gldbm_featsel(w,wfsel)
240 to 1 trained mapping --> gldbm

% we can look into the wnew mapping to see only the group of wavelengths 87 to 170
% is used in this reduced GLDB mapping:
>> getdata(wnew)
clf: [1x1 struct]
l: 87
u: 170

Hypertools provides also impementation of the top-down GLDB algorithm using the Log-odds probability-
based criterion ( gldbm_td_prob ) as proposed by the GLDB authors. We have also implemented the
top-down GLDB using the apparent error criterion ( gldbm_td_ae ) and using the combined Fisher
separability and correlation criterion used also in the bottom-up case ( gldbm_td ).

7.2 Multi-class GLDB feature extraction


Originally the GLDB algorithm was defined for two-class problems only. For multi-class problems, the
authors proposed to derive all pair-wise GLDB extractors and train the classifiers in the respective feature
spaces. In a C-class classification problem, a newcoming object is subjected to all C(C-1)/2 stored feature
extractors and classifiers and finally to a majority voting combiner.
In our paper:
Paclik, Verzakov, Duin, Multi-class extensions of the GLDB feature extraction algorithm for spectral
data, In proc. of ICPR, 2004
we have presented two alternative solutions significantly limiting the execution complexity of a multi-
class classifiers employing GLDB feature extraction. Hypertools toolbox contains an implementation of
one of them - a GLDB feature extractor with multi-class criterion. The criterion, utilizing the same
concept as in the two-class case - a combined Fisher separability and inter-band correlation. If more
then two-classes are available, the Fisher projection yields min(wavelelengths in a group,C-1) output
features for each wavelength group.

% train the multi-class GLDB extraction on a training set a1 with five classes:
>> a1
1320 by 240 dataset with 5 classes: [372 360 258 161 169]
>> w=gldbm_multi(a1)
Best Bases mapping, 240 to 42 trained mapping --> gldbm_multi

% let us look into the mapping (asking for a user-specific data of the trained mapping):
>> getdata(w)
clf: [1x26 struct]
l: [1x26 double]
u: [1x26 double]
info_iter: 94

We can observe that the multi-class GLDB mapping is composed of 26 wavelength groups but yields
42 features.
Similarly to the two-class GLDB, the second-stage feature selection may be carried on and the resulting
feature selection mapping combined with the trained GLDB mapping so only the informative wavelength
groups are retained.

16
>> c1=a1*w
1320 by 42 dataset with 5 classes: [372 360 258 161 169]

% the feature selection on the output of the GLDB extractor:


>> wfsel=featself_simple(c1)
Forward Feature Selection, 42 to 22 fixed mapping --> cmapm

% create a reduced GLDB mapping retaining only informative features:


>> wnew=gldbm_mc_featsel(w,wfsel)
240 to 22 trained mapping --> gldbm

Note, that the feature selection operates on the level of output features not the wavelength groups.
Finally, Hypertools provides a multi-class GLDB extractor leveraging the non-linear Fisher criterion,
introduced in:

Marco Loog, R.P.W. Duin, R. Haeb-Umbach, Multiclass Linear Dimension Reduction by Weighted
Pairwise Fisher Criteria, IEEE PAMI, vol. 23, no. 7, July 2001
The non-linear Fisher criterion is beneficial in situations where some classes are very distant from
others in the feature space. While the classical Fisher projection would emphasize the distant class,
the non-linear will re-weight the class contributions and offer a better overall performance. In PRTools,
the non-linear Fisher mapping is implemented in nlfisherm function. The Hypertools implements the
non-linear multi-class GLDB in the nlgldbm routine.

7.3 Genetic algorithm for feature selection


Genetic algorithm is an optimization method based on evolutionary concepts. Genetic-based feature
selection algorithm works as follows: feature subsets, encoded by binary vectors, form a population
of solutions. Quality of a feature subset (in genetic terminology a chromosome) may be evaluated by
criterion based on class separability. Initial population of feature subsets is generated randomly and
all chromosomes are evaluated. Chromosomes providing better class separation have a higher chance
to be selected for mating than the worse ones. Crossover operation is executed mixing randomly the
features between couples of good chromosomes. Underlying idea is that generated offspring may often
improve qualities of the parents. Additionally, with low probability, some chromosomes in the population
are subjected to mutation. That means, that some genes (features) get randomly flipped. Mutation
introduces new qualities or distortions, not present in the original population. In the optimization sense,
mutation may help to escape from a local optimum.
Hypertools toolbox provides a simple genetic algorithm
genfeatsel for feature selection, as described in:
Siedlecki, Sklansky, A note on genetic algorithms for large-scale feature selection, Pattern Recogni-
tion Letters, vol.10, pp.335-347, 1989
Apparent error of the Fisher classifier is used to measure chromosome quality. In the following ex-
periment, we use genetic algorithm to select best feature subset for a two-class dataset. We build a
population of 100 chromosomes (solutions) and perform 5 generations:

17
>> a1
projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

% training set
>> b=seldat(a1,[2 3])
projectA, 618 by 240 dataset with 2 classes: [360 258]

>> w=genfeatsel(b,100,5)
Feature Selection, 240 to 2 fixed mapping --> cmapm

% genetic algorithm identified 2 features as the best subset

% independent test set:


>> ts=seldat(a2,[2 3])
607 by 240 dataset with 2 classes: [353 254]

% lets visualize the selected 2D feature space using a scatter plot:


>> scatterdui(ts*w)

Figure: Scatter plot of a test set with features, identified by genetic algorithm

7.4 Maximum Autocorrelation Transformation (MAF)


This is a special purpose version of Principal Component Analysis (PCA) for image data. The covariance
matrix is slightly modified such that the covariances are computed for the one-pixel shifted images.
Further, transformed data are constrained to be univariate. As a result the transformation maximizes
image autocorrelation.

>> a1
projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]
>> w=maf(a1,0.9)
240 to 2 trained mapping --> affine
>> scatterdui(a1*w)

18
Figure: Scatter plot of the spectral data projected by MAF mapping

7.5 Principial Component Analysis


todo:describe the use of pcatrain, pcaapply, pcanpc, pcacrit and pcastat

7.6 PCA shaving


PCA shaving performs a backward elimination procedures to find the groups (clusters) of correlated/covariated
wavelengths. Elimination is based on the ranking of wavelengths according to their participation in the
the first PC loading. PCA shaving may be run in a supervised or unsupervised mode.
Hypertools pcashave implementation is fully functional but doesnt provide unified mapping output,
yet. The following figure illustrates possible use of PCA shaving for unsupervised grouping of wavelengths
in spectrum. Wavelength color denote the membership in one of 10 identified groups.

todo:pcashave examples, the usage of the routine

7.7 Canonical Correlation Analysis (CCA)


Canonical Correlation Analysis commutates a linear transformations of data X to new representation
T and output (target) data Y to new representation U such that i-th columns of the T and U have

19
maximum possible correlation and at the same time are orthogonal to the previous columns in both
matrices. Columns of T and U are normalized to be univariate. Because X and Y can be not full-ranked,
pseudo inverse pinv is used in the algorithm.

% prepare training and test sets:


>> b=seldat(a1,[2 3])
projectA, 618 by 240 dataset with 2 classes: [360 258]
>> bts=seldat(a2,[2 3])
607 by 240 dataset with 2 classes: [353 254]

>> plot_cca(b,bts)

Figure: Superimposed training (lighter colors) and test data (dark colors), projected by the CCA mapping

We can see the bright unimodal clusters corresponding to the training data, projected by CCA map-
ping. The darker markers denote the test set, projected using the same mapping. It is apparent, that in
this case the training set is not representative of the problem.

7.8 Partial Least Squares (PLS) regression mapping


Partial Least Square (PLS) is a multiple linear regression technique, which maps input data onto set of
target variables. It can be used for regression, classification (targets from crisp labels) or for visualization.
In hypertools, PLS regression is implemented as plsm mapping.
todo:plsm examples

8 Image segmentation
Image segmentation is an unsupervised pattern recognition technique producing a unique assignment of
image pixels (spectra) into a set of classes. Because even the number of classes is usually unknown, image
segmentation is, in fact, an ill-posed clustering problem.

8.1 Segmentation combining spatial and spectral domain


Hypertools implements in segment_comb an image segmentation algorithm, combining spectral and spa-
tial information using a combined classifier approach:
Paclik P., Duin R.P.W., van Kempen G.M.P., Kohlus R.: Segmentation of multi-spectral images
using the combined classifier approach, Image and Vision Computing, vol.21, num.6, pp.473-482,
June 2003

20
Firstly, a set of labels is created by clustering the spectral data domain. Then, in a loop, separate
classifiers are trained and executed in both domains: by default the nearest mean classifier (nmc) in the
spectral domain and Parzen classifier with a Gaussian kernel in the spatial domain (implemented by
convolution). Both domains are combined using the product combination rule. The process is repeated
until stability.
First, we segment a spectral image using a raw spectral data with 240 spectral wavelengths. We
cluster the data using k-means algorithm and using the combined spectra-spatial segmentation algorithm
(starting from the output of k-means). In the results bellow, we can see that some spatial inconsistencies
are improved using the combined segmentation algorithm because the spatial information is employed.

>> a1
projectA, 1320 by 240 dataset with 5 classes: [372 360 258 161 169]

% left figure: simple k-means


>> lab=kmeans(a1,5);
>> getli(lab,a1) % output is a dip_image window
>> speccolormap % high contrast color map for labels

% right figure: the combined classifier started from the labels


% provided by k-means
>> seg=segment_comb(setlabels(a1,lab),5)
>> speccolormap

Figure: Result of kmeans clustering (left) and combined spectral-spatial algorithm (right) using raw
spectra
In the second experiment, we build a dissimilarity-based representation of the spectral image using
Spectral Angle Mapper (SAM) distance measure. A set of five, randomly selected, prototypes forms a
representation set. Again, both the k-means and the combined spectral-spatial algorithms generate image
labelling:

>> proto=gendat(+a1,5)
5 by 240 dataset with 1 class: [5]
>> drawident(a1(:,100),proto)
>> D=dasam(a1,proto)
1320 by 5 dataset with 5 classes: [372 360 258 161 169]

21
Figure: Five randomly selected prototype pixels (spectra)

>> lab=kmeans(D,5);
>> getli(lab,a1) % left figure

>> seg=segment_comb(D,5) % right figure

Figure: Result of kmeans clustering (left) and combined spectral-spatial algorithm (right) using SAM
distances
k-means clustering algorithm doesnt take into account spatial connectivity and, therefore, provides a
noisy solution which may be homogenized using the combined spectra-spatial algorithm.

8.2 ECHO segmentation


ECHO algorithm, proposed by D.Landgrebe
todo:echo

9 Classifiers
Hypertools implements several classifiers, traditionally used by the spectral community:
mindistc : Minimum distance classifier for dissimilarity to prototypes. An example is available
here
corrc : correlation classifier
samc : Spectral Angle Mapper classifier
todo:simca desciption + simcac and simcam examples

22

You might also like