You are on page 1of 10

International Journal of Fuzzy Systems, Vol. 13, No.

3, September 2011

153

A New Nonlinear Fuzzy Robust PCA Algorithm and Similarity Classifier in Classification of Medical Data Sets
Pasi Luukka Abstract1
In this article a classification method is proposed where data is first preprocessed using new nonlinear fuzzy robust principal component analysis (NFRPCA) algorithm to get data into more feasible form. After this preprocessing step the similarity classifier is then used for the actual classification. The procedure was tested for dermatology, hepatitis and liver-disorder data. Results were quite promising and better classification accuracy was achieved than using classical PCA and similarity classifier. This new nonlinear fuzzy robust principal component analysis algorithm seems to have the effect that it project the data sets into a more feasible form and when used together with the similarity classifier a classification accuracy of 72.27 % was achieved with liver-disorder data, 88.94 % with hepatitis, and 97.09 % accuracy was achieved with dermatology data. Compared to results with classical PCA and the similarity classifier, higher accuracies were achieved with the approach using nonlinear fuzzy robust principal component analysis and the similarity classifier. Keywords: Dimension reduction, Nonlinear Fuzzy robust PCA, Medical data, Similarity classifier. A major problem in mining scientific data sets is that the data is often high dimensional, i.e. in many cases a large number of features represent the object. Consequently, computational time for pattern recognition algorithms can become prohibitive, which can be a severe problem, especially when some of the features are not discriminatory. In addition to the computational cost, irrelevant features may also cause a reduction in the accuracy of some algorithms. Several methods have been recently proposed to deal with feature selection problems [7, 8, 9]. To address this problem of high dimensionality, a common approach is to identify the most important features associated with an object, so that further processing can be simplified without unduly compromising the quality of the final results. There are several different ways in which the dimension of a problem can be reduced. The simplest approach is to identify important attributes based on the input from domain experts. Another commonly used approach is Principal Component Analysis (PCA) [2], which defines new attributes (principal components or PCs) as mutually-orthogonal linear combinations of the original attributes. For many data sets, it is sufficient to consider only the first few PCs, thus reducing the number of dimensions. PCA can be used in classification as a preprocessing method. The advantages usually include the reduced number of dimensions and hence lowered computational cost and increased classification accuracy. Here the classification problem is addressed so that the data are first preprocessed using the new nonlinear fuzzy robust principal component (NFRPCA) analysis algorithm and then the data are classified using the similarity classifier. Similarity classifier has been chosen because it has been shown to work with this type of data [11] when the data are first preprocessed with classical PCA. This new nonlinear FRPCA presented here is derived from the linear case of fuzzy principal component analysis algorithm introduced in [1] and it is investigated here as preprocessing algorithm in the task of classification, which is based on the similarity classifier [10, 11]. It is also compared with given data sets to results achieved using linear case of FRPCA and similarity classifier [12] and also to case where classical PCA is first used and then resulting data is being classified by similarity classifier. Classical PCA together with similarity classifier has been used in

1. Introduction
Principal component analysis (PCA) [2] is a well extablished technique for data analysis and preprocessing. The general motivation for PCA is dimension reduction. PCA decomposes high dimensional data into a low dimensional subspace component and a noise component. Nowadays, dimensionality reduction techniques such as PCA are often used before classification [3, 4, 5] Many databases that come from the real world are coupled with noise, a random error or variance of a measured variable [6]. Thus, real world data analysis is almost always burdened with uncertainty of different kinds.

Corresponding Author: Pasi Luukka is with the Department of Mathematics and Physics in Lappeenranta University of Technology. Email: pasi.luukka@lut.fi Manuscript received 17 June 2010; revised 03 May 2011; accepted 29 Aug. 2011.

2011 TFSA

154

International Journal of Fuzzy Systems, Vol. 13, No. 3, September 2011

be sensitive to liver disorders that might arise from excessive alcohol consumption. Hepatitis disease data set: The data set was donated by G.Gong (Carnegie-Mellon University) via Bojan Cestnik from the Jozef Stefan Institute, Slovenia. The problem is to predict the presence or absence of hepatitis disease given the results of medical tests carried out on a patient. The data set consists of 155 samples and 19 attributes. Attributes of symptoms that are obtained from patients are the following: 1. age: 10,20,30,40,50,60,70,80 2. sex: male,female 3. steroid: no,yes 4. antivirals: no,yes 5. fatigue: no,yes 6. malaise: no,yes 7. anorexia: no,yes 8. liver big: no,yes 9. liver firm: no,yes 10. spleen palpable: no,yes 11. spiders: no,yes 12. ascites: no,yes 13. varices: no,yes 14. bilirubin: 0.39,0.8,1.2,2.0,3.0,4.0 15. alkaline phosphate: 33, 80, 120, 160, 200, 250 16. sgot: 13,100,200,300,400,500 17. albumin: 2.1, 3.0 3.8, 4.5, 5.0, 6.0 18. protime:10,20,30,40,50,60,70,80,90 19. histology: no,yes Dermatology data set: The data set comes from Gazi University and Bilkent University and was donated by N. Ilker and H.A. Gvenir. This data set contains 34 attributes and contains 366 instances. Attributes and distribution due to class variable of this data set are given in Table 1. The erythemato-squamous diseases are psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, cronic dermatitis and pityriasis rubra pilaris. These diseases are frequently seen in outpatient dermatology departments. They all share the clinical features of erythema and scaling with slight variations and this makes the differential diagnosis of erythemato-squamous diseases difficult. All 2. Data Sets the diseases look very much alike with erythema and Liver-disorder data set: The data set was donated (by R. scaling. When inspected more carefully, some patients S. Forsyth) to the UCI machine learning data repository have the typical clinical features of the disease at predi[23]. The problem is to predict whether or not a male lection sites (localizations of the skin which a disease patient has a liver-disorder based on blood tests and al- prefers) while another group has typical localizations. cohol consumption. The attribute information for the Another difficulty for differential diagnosis is that a disliver-disorder data set is the following: (1) mean cor- ease may show the histopathological features of another puscular volume (2) alkaline phosphotase (3) alamine disease in the early stages and may have the characteristic aminotransferase (4) aspartate aminotransferase, (5) features in the following stages. Furthermore some samgamma-glutamyl transpeptidase (6) number of half-pint ples show the typical histopathological features of the equivalents of alcoholic beverages drunk per day. The disease while some do not [24]. first five variables are all blood tests which are thought to [11] and PCA for fuzzy data [13] was used together with the similarity classifier in [14] when data were formed from linquistic attributes. In the same way as the notion of a fuzzy subset generalizes that of the classical subset, the concept of similarity can be considered as a many-valued generalization of the classical notion of equivalence [15]. As an equivalence relation is a familiar way to classify similar objects, fuzzy similarity is an equivalence relation that can be used to classify multi-valued objects. Due to this property, it is suitable for classifying problems that can be classified based on clustering by finding similarities in objects. There is a close link between the notion of similarity and that of distance (see for example [16] and [17]). Also when PCA is used, outliers are known to influence on the resulting principal component [18] and hence they also influence the classification results. There are several fuzzy methods in regression analysis which deal with outliers i.e. [19, 20], but fuzzy classification methods which are combined with nonlinear PCA methods only [21] is known. Here we propose a new nonlinear fuzzy robust PCA method combined with similarity classifier to address this problem. Data sets used in this experiment were taken from a UCI-Repository of Machine Learning Database [23]. The liver-disorder data set mainly consists of blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. In dermatology, the differential diagnosis of erythemato-squamous diseases is considered to be quite a difficult problem. The diseases in this group are psoriasis, seborrheic dermatitits, lichen planus, pityriasis rosea, chronic dermatitis and pityriasis rubra pilaris. They all share the clinical features of erythema and scaling with very few differences [22]. In the hepatitis data set, the purpose is to predict the presence or absence of hepatitis disease given the results of medical tests carried out on a patient. Classifier and preprocessing methods were implemented with MATLABTM-software.

Pasi Luukka: A New Nonlinear Fuzzy Robust PCA Algorithm and Similarity Classifier in Classification of Medical Data Sets

155

A. Fuzzy Robust Principal Component Analysis The fuzzy robust principal component analysis algoHere a classification method is proposed where data is rithms used here and from where the nonlinear case is first preprocessed either using linear fuzzy robust prin- derived were introduced in [1]. The robust principal cipal component analysis algorithms [1] or with nonlinear component algorithms which Yang & Wang proposed in case represented in next subsection, to get it in more fea- [1] are basically based on Xu & Yuilles algorithms [18] sible form. After the preprocessing step the data is then where PCA learning rules are related to energy functions and they proposed an objective function with the conclassified using the similarity classifier [10, 11]. In the first subsection the fuzzy robust principal com- sideration of outliers. In Yang & Wang's proposed ponent analysis algorithms presented in [1] are introduced. methods the objective function was extended be fuzzy The new nonlinear case of the third algorithm is then and they includes Xu & Yuilles algorithms as a crisp derived. The similarity classifier [10, 11], is presented in special cases. Next these methods are briefly presented. A the second subsection. In all data sets data was splitt in more thorough description can be found in [18] and [1]. half. One half was used for training and one half for Xu and Yuille [18] proposed an optimization function, testing. This procedure was repeated randomly 30 times subject to u i {0,1} as and mean classification accuracies were computed. n n (1) E (U , w) = u i e( xi ) + 1 u i

3. Classification Procedure

Table 1.Class distribution of dermatology data set.


Attributes Clinical Histopathological att. 1: erythema att. 2: scaling att. 3: definite borders att. 4: itching att. 5: koebner phenomenon att. 6: polygonal papules att. 7: follicular papules att. 8: oral mucosal involvement att. 9: knee and elbow involment att. 10: scalp involvement att. 11: family history att. 34: age

Class Psoriasis (111) Seboreic dermatitis(60) Lichen planus(71) Pityriasis rosea(48) Cronic dermatitis(48) Pityarisis rubra pilaris(20)

att. 12: melanin incontinence att.13: eosinophils in infiltrate att. 14: PNL infiltrate att. 15: fibrosis of the papillary dermis att. 16: exocytosis att. 17: acanthosis att. 18: hyperkeratosis att. 19: parakeratosis att. 20: clubbing of the rete ridges att. 21: elongation of the rete ridges att. 22: thinning of the suprapapillary epidermis att. 23: pongiform pustule att. 24: munro microabcess att. 25: focal hypergranulosis att. 26: disappearance of the granular layer att. 27: vaculization and damage of basal layer att. 28: spongiosis att. 29: saw-tooth appearance of retes att. 30: follicular horn plug att. 31: perifollicular parakeratosis att. 32: inflammatory mononuclear inflitrate att. 33: band-like infiltrate

where X = {x1 , x 2 ,..., x n } is the data set, U = {u i i = 1,..., n} is the membership set and is the threshold. The goal is to minimize E(U,w) with respect to ui and w. Notice here that ui is a binary variable and w is continuous a variable which makes optimization hard to solve with a gradient descent approach. To solve the problem using gradient descent approach the minimization problem was transformed to maximization of the Gibbs distribution of following form: exp(E (U , w)) (2) P(U , w) = Z where Z is the partition function ensuring P(U , w) = 1 .
U w

i =1

i =1

The measure e( xi ) could be, e.g., one of the following functions:


e1 ( xi ) = xi w T xi w
2

(3)
wT xi xiT w wT w

e 2 ( xi ) = xi

wT xi
2

E1 = i =1 e1 ( xi ) and E 2 = i =1 e2 ( xi ) are
n n

The

gradient

w descent

= xiT xi

(4)

rules

for

minimizing (5)

w new = w old + t [ y ( xi u ) + ( y v) xi ]

w (6) w new = w old + t xi y T y 2 w w where t is the learning rate, y = wT xi ,u = yw and v=wTu. Oja presented a nonlinear PCA [28], where (7) e3 ( x i ) = x i w T g ( y )

and where y=xiw and g can be chosen as nonlinear functions. In this case the weight updating would be (8) w new = w old + t (xi e T w old F + e3 ( xi ) g ( y) )

156

International Journal of Fuzzy Systems, Vol. 13, No. 3, September 2011

where F = d ( g ( y )) .
dy

Yang & Wang proposed an objective function as:


E = u im1 e( xi ) + (1 u i ) m1
i =1 i =1 n n

(9)

subject to u i [0,1] and m1 [1, ) . Now ui is the membership value of xi belonging to the data cluster and (1- ui) is the membership value of xi belonging to the noise cluster and m1 is the so called fuzziness variable. Now e(xi) measures the error between xi and the class center. The idea here is quite similar as in the fuzzy C-means algorithm [26]. Since ui is now a continuous variable the difficulty of a mixture of discrete and continuous optimization can be avoided and the gradient descent approach can be used. First the gradient of E (9) is computed with respect to ui. By setting E = 0 , one gets
u i

FRPCA1 algorithm: Step 1: Initially set the iteration count t=1, iteration bound T, learning coefficient 0 (0,1] soft threshold to a small positive value and randomly initialize the weight w. Step 2: While t is less than T, do steps 3-9. Step 3: Compute t = 0 (1 t / T ) , set i=1 and =0. Step 4: While i is less than n, do steps 5-8. Step 5: Compute y = wT xi ,u = yw and v=wTu. Step 6: Update the weight: w new = w old + T ( xi )[ y ( xi u ) + ( y v) xi ] Step 7: Update the temporary count = + e1 ( xi ) . Step 8: Add 1 to i. Step 9: Compute = ( / n) and add 1 to t. FRPCA2 algorithm: The same as FRPCA1 except steps 6-7: Step 6: Update the weight: w w new = w old + T ( xi )( xi y T y 2 ) w w Step 7: Update the temporary count: = + e2 ( xi ) FRPCA3 algorithm: The same as FRPCA1 except steps 6-7: Step 6: Update the weight: w new = w old + T ( xi )( xi y wy 2 ) Step 7: Update the temporary count: = + e( xi ) In FRPCA3 this weight updating rule is called the one-unit Oja's algorithm [27] and in FRPCA3 e(xi) is replaced by e1(xi) or e2(xi) separately. New Nonlinear FRPCA3 algorithm: The same as FRPCA3 except steps 6-7 Step 6: Compute g(y), F = d ( g ( y )), e3 ( xi ) = xi wold g ( y ) ,
dy

e( x ) 1+ i Substituting this membership back and after simplification, we have n 1 E = 1 /( m1 1) i =1 e( x i ) 1+ The gradient with respect to w is
1 E = w e( xi ) 1 /( m1 1) 1+

ui =

1
1 /( m1 1)

(10)


m1

m1 1

e( x i )

(11)

e( xi ) w
m1

(12)

Now let
1 ( xi ) = 1 /( m1 1) e( x i ) 1+ where m1 is a fuzziness variable. If m1=1, the fuzzy membership reduces to hard membership and can be determined by the following rule: 1 if e( x i ) < ui = 0 otherwise Now is a hard threshold in this situation. There is no general rule for the setting of m1. Yang & Wang derived the following three algorithms for the optimization procedure:

Update the weight: w new = w old + T ( xi )(xi e3 ( xi ) T w old F + e3 ( xi ) g ( y ) ) Step 7: Update the temporary count: = + e3 ( xi ) In the tasks where nonlinear FRPCA3 was applied in this article g(y) was chosen to be a quite sharp sigmoidal like function g ( y ) = tanh(10 y ) , and F is the derivative of function g(y). This weight updating for classical nonlinear PCA was shown by Oja [28], and as can be seen one can obviously use it also here in connection with the fuzzy robust principal component algorithm. After the data has first been preprocessed with FRPCA algorithms, the resultant new data is then classified using the similarity classifier [10, 11]. Next there is a short description of this procedure. B. Similarity classifier The problem of classification is basically one of partitioning the attribute space into regions, one region for

Pasi Luukka: A New Nonlinear Fuzzy Robust PCA Algorithm and Similarity Classifier in Classification of Medical Data Sets

157

each category. Ideally, one would like to arrange this partitioning so that none of the decisions are ever wrong [29]. We would like to classify a set X of objects into N different classes C1,,CN by their attributes. We suppose that t is the number of different kinds of attributes f1,, ft that we can measure from objects. We suppose that the value for the magnitude of each attribute is normalized so that they can be presented as a value between [0, 1]. Consequently, the objects we want to classify are vectors that belong to [0,1]t. First one must determine for each class the ideal vector vi = (vi ( f 1 ), , vi ( f t )) that represents the class i as well as possible. This vector can be user defined or calculated from some sample set Xi of vectors xi = ( xi ( f 1 ), , xi ( f t )) which are known to belong to class Ci. We can use, e.g., the generalized mean for calculating vi , which is
1 m2 vi ( r ) = x( f r ) m2 r = 1,..., t (13) # X xX i i where power value m2 (coming from the generalized mean) is fixed for all i,r and # X i simply means the number of samples in class i. Once the ideal vectors have been determined, then the decision to which class an arbitrarily chosen x X belongs is made by comparing it to each ideal vector. The comparison can be done, e.g., by using similarity in the generalized ukasiewicz structure
1

to first-order logic [32], which is a well studied area in modern mathematics. Thirdly, it also holds that any pseudo-metric induces fuzzy similarity on a given non-empty set X with respect to the ukasiewicz conjunction [30]. Good sources of information about the \L ukasiewicz structure can be found in [33] and [34].

4. Classification results and comparison


Liver-disorder data set: Classification results of the liver-disorder data set are collected in Table 2. In Figure 1 the effect of m2 and p values in the similarity measure are studied with respect to classification accuracy. From Figure 1 one can see that the area where suitable m2 and p values should be chosen is between m2 = (4,8] and p = [1,10]. In Figure 2 one can see how reducing the dimension with nonlinear robust fuzzy principal component analysis algorithms affects the classification accuracy. Results from the other three linear robust fuzzy principal component algorithms together with the similarity classifier are plotted in this figure. As can be seen from Figure 2, with linear cases the best classification results were found when the number of dimensions was six and the results started to get worse as the number of dimensions was reduced. With nonlinear case however, the results were quite good with all dimensions and even more noticeable was that the best dimension was one. This indicates that this data could be classified in considerably more compact form by using the nonlinear FRPCA3 algorithm which would save required computational time. Variances seemed to be a bit higher when nonlinear FRPCA3 was used compared to the linear cases.
Table 2. Classification results for the liver-disorder data set. The method is listed in the first column, the second column gives the number of dimensions, the third column gives mean classification accuracy (in %), and variances are reported in the fourth column .
Method Liveroriginal LiverPCA LiverFRPCA1 Liver FRPCA2 Liver FRPCA3 Liver
nonlin.FRPCA3

for x, v [0,1] . Here p is a parameter coming from the generalized ukasiewicz structure [10] (if p=1 the equation again becomes its 'normal' form which holds in 'normal' ukasiewicz structure or just simply a ukasiewicz structure) and wr is a weight parameter so that different weights can be given for different attributes to emphazise their importance if it seems appropriate. In this study weights were set as one. The similarity measure has a strong mathematical background [30], [16] and has proven to be a very efficient measure in classification [11]. We decide that x C i if
t

1 t S x, v = wr (1 x( f r ) p v( f r ) p ) m2 / p t r =1

1 / m2

(14)

Dimen. 6 6 6 6 6 1

Mean 63.09 66.50 68.25 67.88 70.25 72.27

Var 0.0055 0.0073 0.0080 0.0081 0.0110 0.0110

S x, vi = max S x, vi
i =1,..., N

(15)

In other words the decision to which class the sample belongs is made according to which ideal vector the sample has the highest similarity value. There are several reasons why the ukasiewicz structure is chosen in defining memberships of objects. One reason is that in the ukasiewicz structure, it holds that the mean of many fuzzy similarities is still a fuzzy similarity [31]. Secondly, the ukasiewicz structure has a strong connection

The parameter which most effected classification accuracy in FRPCA algorithms was the fuzziness variable m1. Other parameters in FRPCA algorithms seemed to have no significant difference in mean classification accuracies. The effect of changing the fuzziness variable m1 is studied in Figure 3. Generally, when m1=1 we are considering a crisp case and we make crisp division in the objective function; the larger the m1 value, the fuzzier the division. Fuzziness

158

International Journal of Fuzzy Systems, Vol. 13, No. 3, September 2011

variable changes seemed to have the greatest effect on nonlinear FRPCA3. As can be seen from Figure 3, when the m1 value was close to one the lowest results were achieved and the highest classification accuracy was gained with m1 =1.5. Highest mean classification accuracy was 72.27 % with variance 0.0026. This was achieved with the nonlinear FRPCA3 algorithm. When we compare this to results achieved when original data was used, about 9 % higher accuracy was achieved when the data was first preprocessed using the nonlinear FRPCA3 algorithm. About 6 % higher classification accuracy was achieved with nonlinear FRPCA3 compared to results, where classical PCA was first used followed by classification with the similarity classifier. When compared with results for other fuzzy robust PCA algorithms one can see that about 2-4 % better results were found with nonlinear FRPCA3.
Classification accuracy of liverdisorder data

0.75

Classification accuracy

0.7

0.65

0.6

0.55

FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3 1 2 3 4 5 6

0.5

Dimension

(a)
0.014 FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3 0.012

0.01

Variance

0.008

0.006

0.004

0.002

Dimension

Classification accuracy

0.8

0.75

(b) Figure 2. (a) Mean classification accuracies and (b) variances studied with respect to the reduced number of dimensions and comparison of FRPCA algorithms with liver-disorder data set.
0.74 0.72

0.7

0.65

Classification accuracy

0.7 0.68 0.66 0.64 0.62 0.6 0.58 FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3

10 15 5 5 0 0 10

Parameter p

Mean value m

(a)
Variances with liverdisorder data
x 10 3 2.5
3

1.2

1.4

1.6

1.8

Fuzziness variable m1

Figure 3. Effect of fuzziness parameter m1 to classification accuracy with the liver-disorder data set.

Variance

2 1.5 1 0.5 0 10 15 5 5 0 0 10

Parameter p

Mean value m

(b) Figure 1. (a) Mean classification accuracies and (b) variances with liver-disorder data plotted with respect to p and mean values m2 in similarity classifier.

Hepatitis data set: Classification results of the hepatitis data set are collected in Table 3. Preprocessing the data first using PCA algorithms enhanced the classification accuracy with all cases in this data set. Using original data, a classification accuracy of 85.75 % was achieved. Comparing classical PCA and linear FRPCA methods one can see that data preprocessed with classical PCA gained a classification accuracy of 88.21 %. This was higher than linear FRPCA methods managed. The highest classification accuracy was reached when the data was preprocessed with the nonlinear FRPCA3 a classification accuracy of 88.94 % and variance of 0.0121. Preprocessing the data first with the nonlinear FRPCA3 algorithm gave about 3 % higher accuracy compared to original data and about one percent higher classification accuracy when compared with other PCA algorithms. In Figure 4 the effect of m2 and p values in similarity classifier are studied with respect to classification accuracy. From

Pasi Luukka: A New Nonlinear Fuzzy Robust PCA Algorithm and Similarity Classifier in Classification of Medical Data Sets

159

Figure 4 one can see that the area where suitable m2 and p values should be chosen is between m2 = [5,10] and p = [5,10] . In Figure 5 one can see how reducing the number of dimensions affects the classification accuracies with FRPCA algorithms. Lowering the number of dimensions seems to affect the classification results with FRPCA3 and nonlinear FRPCA3 so that accuracy is reduced as the dimension gets lower but with FRPCA1 and FRPCA2 the highest accuracy was gained when the dimension was reduced to one. The effect of fuzziness variable is shown in Figure 6. Linear FRPCA algorithms did not seem to react as much to m1 value changes as the nonlinear FRPCA3 algorithm did, as can be seen from Figure 6. Greatest classification accuracy was reached with the nonlinear FRPCA algorithm when m1 =2.
Table 3. Classification results for the hepatitis data set. The method is listed in the first column, the second column gives the number of dimensions, the third column gives mean classification accuracy (in %), and variances are reported in the fourth column.
Method Hepatitisoriginal HepatitisPCA HepatitisFRPCA1 Hepatitis FRPCA2 Hepatitis FRPCA3 Hepatitis nonlin.FRPCA3 Dimen. 16 13 1 1 15 16 Mean 85.75 88.21 87.34 87.34 87.10 88.94 Var 0.0020 0.0659 0.0656 0.0523 0.0315 0.0121

Classification accuracy with respect to reduced dimension with hepatitis data


0.92 0.9 FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3

Classification accuracy

0.88 0.86 0.84 0.82 0.8 0.78 0.76 0.74

10

12

14

16

Dimension

(a)
Variances with respect to reduced dimension with hepatitis data
0.09 0.08 0.07 0.06 FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3

Variance

0.05 0.04 0.03 0.02 0.01 0

10

12

14

16

Dimension

(b) Figure 5. (a) Mean classification accuracies and (b) variances studied with respect to the reduced number of dimensions and comparison of FRPCA algorithms with hepatitis data set.
Classification accuracy with respect to fuzziness variable m with hepatitis data
1
0.91 FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3

Classification accuracy of hepatitis data

Classification accuracy

0.9 0.8 0.7 0.6 0.5 0.4 10 15 5 5 0 0 10

Classification accuracy

0.9

0.89

0.88

Parameter p

Mean value m

0.87

(a)
Variances with hepatitis data

0.86

0.85
0.04

1.2

1.4

1.6

1.8

Fuzziness variable m

0.03

Variance

0.02

Figure 6. Effect of fuzziness parameter m1 to classification accuracy with the hepatitis data set.

0.01

0 10 15 5 5 0 0 10

Parameter p

Mean value m

(b) Figure 4. (a) Mean classification accuracies and (b) variances with hepatitis data set plotted with respect to p and mean values m2 in similarity classifier.

Dermatology data set: Classification results with the dermatology data set when the data is first preprocessed with the previously mentioned different PCAs is collected in Table 4. Classical PCA seemed to work with this data set worse, with a mean accuracy of 95.83 %, and nonlinear FRPCA3 seemed to give the highest classification accuracy 97.09 %. Linear FRPCA algorithms seemed to enhance the results somewhat compared to classification results with original data. Nonlinear

160

International Journal of Fuzzy Systems, Vol. 13, No. 3, September 2011

Variance

FRPCA3 seemed to work best with every dimension compared to linear FRPCA methods, as can be seen in Figure 8 where the results are plotted with respect to the reduced number of dimensions. Classification results with a similarity classifier and the nonlinear FRPCA had slightly higher variances of 0.0107 than with linear FRPCA cases where the variances were below 0.005. The effect of the fuzziness parameter is shown in Figure 9. There one can see that preprocessing the dermatology data with the nonlinear FRPCA3 gave the greatest classification accuracy in all varied m1 value cases when compared to the three linear cases. Greatest accuracy with the nonlinear FRPCA3 was gained when the fuzziness variable m1 was m1 =1.5. In Figure 7 the effect of m2 and p values in similarity classifier are given with respect to classification accuracy. From Figure 7 one can see that the area where suitable m2 and p values should be chosen is between m2 = (0,4] and p = (0,3] .
Classification accuracy of dermatology data

Classification accuracy with respect to reduced dimension with hepatitis data


0.92 0.9 FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3

Classification accuracy

0.88 0.86 0.84 0.82 0.8 0.78 0.76 0.74

10

12

14

16

Dimension

(a)
Variances with respect to reduced dimension with dermatology data
0.016 0.014 0.012 0.01 0.008 0.006 0.004 FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3

Classification accuracy

0.002

0.95 0.9 0.85 0.8 0.75 0.7 0.65 10 15 5 5 0 0 10

10

15

20

25

30

35

Dimension

(b) Figure 8. (a) Mean classification accuracies and (b) variances studied with respect to the reduced number of dimensions and comparison of FRPCA algorithms with dermatology data set.
Classification accuracy with respect to fuzziness variable m1 with dermatology data
0.98 FRPCA1 FRPCA2 FRPCA3 Nonlin. FRPCA3

(a)
Variances with dermatology data
x 10 2.5 2
3

Classification accuracy

Parameter p

Mean value m

0.975

0.97

0.965

0.96

0.955

Variance

0.95

1.5 1 0.5 0 10 15 5 5 0 0 10

1.2

1.4

1.6

1.8

Fuzziness variable m1

Figure 9. Effect of fuzziness parameter m1 to classification accuracy with the dermatology data set.

5. Discussion
In this article a classification method was studied, where the data is first preprocessed using fuzzy robust principal component analysis (FRPCA) algorithms to get the data into a more feasible form and then classified using the similarity classifier. The nonlinear fuzzy robust principal component analysis algorithm was introduced and tested. This nonlinear fuzzy robust principal component analysis algorithm gave greater classification ac-

Parameter p

Mean value m

(b) Figure 7. (a) Mean classification accuracies and (b) variances with dermatology data set plotted with respect to p and mean values m2 in similarity classifier.

Pasi Luukka: A New Nonlinear Fuzzy Robust PCA Algorithm and Similarity Classifier in Classification of Medical Data Sets

161

curacies for liver-disorder, hepatitis and dermatology data sets than linear FRPCA algorithms. The results were also compared to results achieved using classical PCA as a preprocessing step and also to results where the original data was simply used and then classified using the similarity classifier. Best results were achieved using the nonlinear FRPCA algorithm as a preprocessing step and then classifying the preprocessed data using the similarity classifier. Improvement is due to the reason that feature space is divided differently due to nonlinearity compared to linear FRPCA. This makes it possible to consider different samples as outliers as earlier linear method. Hence nonlinear FRPCA introduced here, considers different samples to be outliers than linear one. With the liver-disorder data set, accuracy of 72.27 % was reached using the nonlinear FRPCA in preprocessing which is about 9 % higher accuracy compared with the results achieved when the original data was used. About 6 % greater classification accuracy was achieved with the nonlinear FRPCA compared with the results where classical PCA was first used and then classified with the similarity classifier. When compared with results with other fuzzy robust PCA algorithms one can see that about 2-4 % better results were found with the nonlinear FRPCA3. With the hepatitis data set, accuracy of 88.94 % was achieved with the nonlinear FRPCA algorithm, which is about 3 % greater classification accuracy than gained with the original data. Classical PCA and the similarity classifier gave the second highest classification accuracy 88.21 %. With the dermatology data set, classification accuracy of 97.09 % was achieved using the similarity classifier when the data was first preprocessed with the nonlinear FRPCA3. This was the highest mean accuracy. About one percent improvement was achieved by preprocessing the data using the nonlinear FRPCA3 algorithm. In this article it was shown that presented combination is well suited for classification of presented medical data sets. A nonlinear extension to fuzzy robust principal component analysis (FRPCA) algorithm which were presented in [1] was introduced. This nonlinear FRPCA3 algorithm was also studied in the classification with the similarity classifier as a preprocessing step before the actual classification. Results achieved using this procedure were promising. Proposed method can be used in diagnosis of such data where it is suitable to apply nonlinear combinations together with outlier removal as rather than linear combinations as in FRPCA. In this paper we have demonstrated its' usage with diagnosis of liver-disorders, hepatitis and dermatology, but usage does not limit to these three cases, but can be applied to further diagnostic problems, where nonlinear combinations with outlier removal is a better option.

From the results we can see that in liverdisorder diagnosis first component is enough to get the best mean diagnosis results, thus saving computatinal time and more importantly in this case increasing accucary. With hepatitis data set all 16 combinations should be used to get highest mean accuracy and with dermatology 28 first combinations from nfrpca should be used in diagnosis.

References
[1] T. N. Yang and S. D. Wang, Robust algorithms for principal component analysis, Pattern Recognition Letters, vol. 20, pp. 927-933, 1999. [2] I. Jolliffe, Principal component analysis, Springer-Verlag, 1986. [3] M. Kirby and L. Sirovich, Application of the Karhunen-Loeve procedure for the characterization of human faces, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 12, no. 1, pp. 103-108, 1990. [4] M. A. Turk and A. P. Pentland, Face recognition using eigenfaces, In Proceedings of IEEE conference on computer vision and pattern recognition, pp. 586-591, 1991. [5] Y. Langeron, M. Doussot, D. J. Hewson, and J. Duchene, Classifying NIR spectra of textile products with kernel methods, Engineering Applications of Artificial Intelligence, vol. 20, pp. 415-427, 2007. [6] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publisher, 2000. [7] P. Luukka, Feature selection using fuzzy entropy measures with similarity classifier, Expert Systems with Application, vol. 38, pp. 4600-4607, 2011. [8] J. M. Yang, P. T. Yu, B. C. Kuo, and M. H. Su, Nonparametric Fuzzy Feature Extraction for Hyperspectral Image Classification, International Journal of Fuzzy Systems, vol. 12, no. 3, pp. 208-217, 2010. [9] Y. C. Yeh, W. J. Wang, C. W. Chiou, Heartbeat Case Determination Using Fuzzy Logic Method on ECG Signals, International Journal of Fuzzy Systems, vol. 11, no. 4. pp. 250-261, 2009. [10] P. Luukka, K. Saastamoinen, and V. Knnen, A Classifier Based on the Maximal Fuzzy Similarity in the Generalized ukasiewicz-Structure, Paper published in Proceedings of the FUZZ-IEEE 2001 conference, Melbourne, Australia, 2001. [11] P. Luukka and T. Lepplampi, Similarity classifier with generalized mean applied to medical data, Computers in Biology and Medicine, vol. 36, pp. 1026-1040, 2006. [12] P. Luukka, Classification based on fuzzy robust PCA algorithms and similarity classifier, Expert

162

International Journal of Fuzzy Systems, Vol. 13, No. 3, September 2011

[13]

[14]

[15] [16] [17] [18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

Systems with Applications, vol. 36, pp. 7463-7468, 2009. T. Denoeux and M. H. Masson, Principal component analysis of fuzzy data using autoassociative neural networks, IEEE Trans. on Fuzzy Systems, vol. 12, no. 3, pp. 336-349, 2004. P. Luukka, PCA for fuzzy data and similarity classifier in building recognition system for post-operative patient data, Expert Systems with Applications, vol. 36, pp. 1222-1228, 2009. L. Zadeh, Similarity relations and fuzzy orderings, Inform Sci, vol. 3, pp. 177-200, 1971. F. Formato, G. Gerla, and L. Scarpati, Fuzzy subgroups and similarities,. Soft Computing, vol. 3, pp. 1-6, 1999. L. Valverde, On the structure of F-Indistinguishability operators, Fuzzy Sets and Systems, vol. 17. 1981. L. Xu and A. L. Yuille, Robust principal component analysis by self-organizing rules based on statistical physics approach, IEEE Trans. on Neural Net., vol. 6, no. 1, pp. 131-143, 1995. C. C. Chuang, J. T. Jeng, and C. W. Tao, Two-Stages Support Vector Regression for Fuzzy Neural Networks with Outliers, International Journal of Fuzzy Systems, vol. 11, no. 1, pp. 20-28, 2009. P. Y. Chen, Y. Y. Fu, K. L. Su, and J. T. Jeng, ARFNNs under Different Types SVR for Identification of Nonlinear Magneto-Rheological Damper Systems with Outliers, International Journal of Fuzzy Systems, vol. 12, no. 4, pp. 311-320, 2010. P. Luukka, Nonlinear fuzzy robust PCA algorithms and similarity classifier in banckryptcy analysis, Expert Systems with Applications, vol. 37, pp. 8296-8302, 2010. H. Gvenir, G. Demirz, and N. Ilter, Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals, Artificial Intelligence in Medicine, vol. 13, pp. 147-165, 1998. D. J. Newman, S. Hettich, C. L. Blake, C. J. Merz, UCI Repository of machine learning databases, Irvine, CA: University of California, Department of Information and Computer Science, 2011. E. D. beyli and I. Gler, Automatic detection of erythemato-squamous diseases using adaptive neuro-fuzzy inference systems, Computers in Biology and Medicine, vol. 35, no. 5, pp. 147-165, 2005. W. H. Wolberg and O. L. Mangasarian, Multisurface method of pattern separation for medical diagnosis applied to breast cytology, In Proceedings of the National Academy of Sciences, vol. 87, pp. 9193-9196, 1990.

[26] J. C. Bezdek, Pattern recognition with fuzzy objective functions, Plenum Press, New York, 1981. [27] E. Oja and J. Karhunen, On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix, J. Math. Anal. Appl. vol. 106, pp. 69-84, 1985. [28] E. E. Oja, The Nonlinear PCA Learning Rule and Signal Separation - Mathematical Analysis, Technical Report, Helsinki University of Technology, p. 26, 1995. [29] R. Duda and P. Hart, Pattern classification and scene analysis, John Wiley & Sons, 1973. [30] F. Klawonn and J. L. Castro, Similarity in fuzzy reasoning, Math Soft Comp, vol. 2, no. 3, pp. 197-228, 1995. [31] E. Turunen, Mathematics behind fuzzy logic, Advances in Soft Computing, Physica-Verlag, Heidelberg, 1999. [32] V. Novak, On the syntactico-semantical completeness of first-order fuzzy logic, Kybernetika, vol. 26, 1990. [33] E. Turunen, Survey of theory and applications of ukasiewicz-Pavelka Fuzzy Logic, Advances in Soft Computing, Physica-Verlag, Heidelberg. pp. 313-337, 2001. [34] J. Lukasiewicz, Selected works, Cambridge Univ Press, 1970.
Pasi Luukka is currently working in Department of Mathematics and Physics in Lappeenranta University of Technology. In 1999 he earned his M.Sc. degree from Department of Information Technology in Lappeenranta University of Technology. In 2005 he earned his Doctor of Science degree from same university. His current research interests include Fuzzy Data Analysis and data analysis methods from Evolutionary Computations.

You might also like