You are on page 1of 5

Using fuzzy Ant Colony Optimization for Diagnosis

of Diabetes Disease
Mostafa Fathi Ganji Mohammad Saniee Abadeh
Faculty of Electrical and Computer Engineering Faculty of Electrical and Computer Engineering
University of Tarbiat Modares University of Tarbiat Modares
Tehran, Iran Tehran, Iran
m.ganji@modares.ac.ir saniee@modares.ac.ir

Abstract Ant colony optimization (ACO) has been used the expert made on other patients with the same conditions [4].
successfully in data mining field to extract rule based The former method depends on experts knowledge while the
classification systems. The Objective of this paper is to utilize latter strongly depends on experts experience with his earlier
ACO to extract a set of rules for diagnosis of diabetes disease. patients. This job is not easy to consider the number of factors
Since the new presented algorithm uses ACO to extract that the expert has to evaluate. To reduce the possible errors
fuzzy If-Then rules for diagnosis of diabetes disease, we call and help the expert, the classification system can be used. The
it FADD. We have evaluated our new classification system via use of classifier systems in medical diagnosis is increasing
Pima Indian Diabetes data set. Results show FADD can detect the
gradually [4]. Expert systems and different artificial
diabetes disease with an acceptable accuracy and competitive or
even better than the results achieved by previous works. In intelligence techniques for classification also help experts in a
addition, the discovered rules have good comprehensibility. great deal and For this reason, many algorithms are proposed
to classification diabetes patients [3, 4, 13, 14].
Keywords- Ant Colony Optimization, diabetes diagnosis, Ant colony optimization (ACO) has been successfully used
medical data mining, classification, fuzzy logic. for the classification task. Parpinelli et al [2] for the first time
employed the ACO for data mining and named it AntMiner1.
I. INTRODUCTION They showed that ACO is a successful method for data
Diabetes is one of the most dangerous diseases, named mining. Then, Liu et al [15] improved AntMiner1, and called
Silent killer. This disease is a major health problem in both it AntMiner2. They believed AntMiner2 doesnt give the
industrial and developing countries, and its incidence is rising. chance of searching to ants and they introduced the new
It is a disease in which either the body does not produce version of it and called it AntMiner3 [12]. Finally, Martens et
enough insulin or the cells ignore the insulin. Insulin is al [1] proposed a new method which has all the advantages of
necessary for the body to be able to use glucose for energy previous versions of AntMiner and named it AntMiner+.
[11]. Diabetes increases the risk of blindness, blood pressure, Saniee et al [5,16] combined the ACO and Fuzzy Logic for
heart disease, kidney disease and nerve damage. This disease Network intrusion detection, and they obtained significant
has two main types [4]: type1 and type 2. The most usual form results. To our best knowledge, ACO is never used for
of diabetes is diabetes type 2 or Diabetes mellitus type 2. diagnosis of diabetes. In this paper we have use ACO and
Millions of people have been diagnosed with diabetes type 2, Fuzzy Logic for diagnosis of diabetes disease. We also have
and unfortunately many more are unaware that they are at high proposed a new framework for fuzzy rule learning. In the new
risk [11]. In diabetes type 2, the body is resistant to the effects presented framework the learning process for each class done
of insulin (a hormone that regulates the movement of sugar independently. To evaluate the final rule-base classifier, two
into cells) or the body doesn't produce enough insulin to evaluation criteria are considered which are classification Rate
maintain a normal glucose level[3].The Pima Indians of and comprehensibility. The former denotes the capability of
Arizona have the highest prevalence and incidence of diabetes the classifier for detecting diabetes pattern in the input
Type 2 of any population in the world[4]. Although with new samples, while the latter refers to the interpretability grade of
medical progresses, early diagnosis of disease has improved the classification system which is dependent on the classifier
but about half of the patients diabetes Type 2 are unaware number of rules and the mean of rules length.
from their disease and may take more than ten years as the The proposed method has been tested using the public
delay from disease onset to diagnosis [11]. While early Pima Indian Diabetes data set available at the University of
diagnosis of disease and treatment of hyperglycemia and California, Irvine web site [17].The results show that this
related metabolic abnormalities are of vital importance. The algorithm can classify the Pima Indian diabetes data set with
diagnosis of diabetes is not easy because there are many acceptable accuracy and competitive or even better than the
factors that the physician must consider. The most important results achieved by earlier works. Also this algorithm has good
two stages in diagnosis of diabetes disease are evaluating data comprehensibility, because it produces a few numbers of rules
taken from patient and referring to the previous decisions that with short length.

Proceedings of ICEE 2010, May 11-13, 2010


978-1-4244-6760-0/10/$26.00 2010 IEEE
The rest of the paper is as follow: Ant Colony interval [0,1] (each triple <attribute, operator, value> called a
Optimization is presented in section 2. The Fuzzy term), C j is the consequent class (i.e., one of the given c
Classification is discussed in Section 3 and section 4 is
devoted to proposed method. Experimental results are reported classes), and CF j is the grade of certainty of the fuzzy ifthen
in Section 5, and Section 6 is conclusions. rule R j . In computer simulations, we use a typical set of
linguistic values in Fig. 1 as antecedent fuzzy sets. The
II. ANT COLONY OPTIMIZATION membership function of each linguistic value in Fig. 1 is
specified by homogeneously partitioning the domain of each
Ant algorithms are based on the cooperative behavior of real
ant colonies, which are able to find the shortest path from a attribute into symmetric triangular fuzzy sets. We use such a
food source to their nest. While walking, real ants deposit a simple specification in computer simulations to show the high
chemical substance called pheromone on the ground. Ants can performance of our fuzzy classifier system, even if the
smell pheromone and when choosing their way, they tend to membership function of each antecedent fuzzy set is not
choose, in a probabilistic way, paths marked by strong tailored. However, we can use any tailored membership
pheromone concentrations. In the absence of pheromone, ants functions in our fuzzy classifier system for a particular pattern
choose paths randomly. Pheromone is evaporated over time, classification problem.
therefore, in shorter paths pheromone evaporation is less in When ACO produced a set of rules, the following steps are
comparison with longer paths and causes to the more applied to calculate the certainty grade of each fuzzy if-then
pheromone is accumulated in the shorter routes. This positive rule: [5]
feedback effect means that because of more pheromone all the Step 1: Calculate the compatibility of each training pattern
ants will eventually use the shortest path. Although a single x p = ( x p 1 , x p 2 ,..., x pn ) with the fuzzy ifthen rule R j by the
ant is capable of building a solution (i.e., a path), the optimal
following product operation:
solution comes about solely as a result of the cooperative
behavior of the ant colony (which is based on a simple form of
indirect communication through the pheromone, called j (x p ) = j 1 (x p1 ) . . . jn (x pn ), p = 1,2,. . ., m , (1)
stigmergy). Although the first ACO algorithm, called Ant
System, was applied to solve the TSP problem [7], a large
number of applications to other problems were proposed after Membership
the introduction of ant system. Recently, the ACO 1.0
metaheuristic was proposed as a common framework for
existing applications [6,9]. S MS M ML L
Each ant builds a possible solution to the problem by
moving through a finite sequence of neighbor states (nodes). 0.0 1.0
a) Attribute
Moves are selected by applying a stochastic local search
directed by the ant internal state, problem-specific local
Membership
information and the shared information about the pheromone.
1.0

III. FUZZY CLASSIFICATION DC

Let us assume that our pattern classification problem is a c -


0.0
class problem in the n -dimensional pattern space with b) Attribute Value
1.0
continuous attributes. We also assume that M real
vectors x p = ( x p1 , x p 2 ,..., x pn ), p = 1, 2,..., M are given as
Figure 1. The used antecedent fuzzy sets in this paper. a)1: Small, 2:
training patterns from the c classes ( c << M ). medium small, 3: medium, 4: medium large, 5: large. b) 0: dont care.
Because the pattern space is [0,1]n , attribute values of each
pattern are x pi [0,1] for p = 1, 2,..., M and i = 1, 2,..., n . In Where ji ( x pi ) is the membership function of ith attribute of
computer simulations of this paper, we normalize all attribute pth pattern and M denotes the total number of patterns.
values of each data set into the unit interval [0,1] .
Step 2: For each class, calculate the relative sum of the
In the presented fuzzy classifier system, we use fuzzy if- compatibility grades of the training patterns with the fuzzy if
then rules of the following form. then rule R j :
Rule R j : If x 1 is A j 1 and and x n is A jn , then Class
C j with CF = CF j . Class h (R j ) =
x p Class h
j ( x p ) N Class h , h = 1, 2,. . ., c
(2)
Where Rj is the label of the jth fuzzy ifthen rule,
Where Class h ( R j ) is the sum of the compatibility grades of
A j 1 ,..., A jn are antecedent fuzzy sets on the unit
the training patterns in Class h with the fuzzy ifthen rule rules related to class k. When FADD learns the rules, returns
R j and N Class h is the number of training patterns which their them to the main function and main function remove the
covered samples. The learning process is done for each class
corresponding class is Class h . separately. This process will be done iteratively and finally a
Step 3: The grade of certainty CF j is determined as set of rules would be discovered and could be used as our
follows: detection model to apply in diabetes disease diagnosis. When
user-defined number (No_Of_Ants) of ants modified the

( )
c
constructed rule, if constructed rule is proper (improve the
CFj = Class h (R j )
j

h =1
Class h (R j ) (3)
classification Rate more than a threshold) then augmented to
the DiscoveredRules otherwise the constructed rule is ignored.
Where The steps of proposed algorithm as follow:
Step1: Set the DiscoveredRules as empty and TrainingSet
= Class h (R j ) (c 1) (4)
as all of training samples.
h h j
Step2: for each class
Step2-1: Call FADD(fig.2.) for learning the
Now, we can specify the certainty grade for any rules of each class.
combination of antecedent fuzzy sets. Such a combination is Step2-2: Add the rules that recently learned (by
generated by the proposed hybrid system will be explained in step 2-1) to DiscoveredRules.
the next sections. Step2-3: Remove the covered samples of
The task of our fuzzy classifier system is to generate TrainingSet.
combinations of antecedent fuzzy sets for generating a rule set
S with high classification ability. When a rule set S is given, Step 3: Compute the grade of certainty CF for each rule
an input pattern x p = ( x p 1 , x p 2 ,..., x pn ) is classified by a of the DiscoveredRules.
Step4: For each input pattern Xp=(x1, x2, x3, ..., xn), the
single winner rule R j * in S , which is determined as follows: single rule Rj can classify Xp which Rj has maximum
product of the compatibility and the certainty grade CF among
all of rules.
(5)
B. Pheromone Initialization
Whenever function FADD called for learning the rules of
That is, the winner rule has the maximum product of the each class, all of cells in the pheromone table are initialized
compatibility and the certainty grade CF j . equally to the following value:
Each fuzzy if-then rule is coded as a string. The following
symbols are used for denoting the five linguistic values: (Fig. (6)
1)
0: don't care (DC), 1: small (S), 2: medium small (MS), 3: Where:
medium (M), 4: medium large (ML), 5: large (L). a: is the total number of attributes;
bi :is the number of values in the domain of attribute i.
IV. THE PROPOSED METHOD C. Rule Construction
As it was mentioned earlier, ACO algorithm has recently Each time function FADD is called at first iteration (T=0),
been used in various kinds of data mining problems such as a rule is created which all terms of this rule have DC value. In
clustering, and classification [1,8].In this section, we discuss the next iterations (T1) an ant can only modify the terms of
the detail of our proposed algorithm for the discovery of the rule that in previous iterations has been constructed. The
classification rules(We call it FADD). This section is divided maximum terms that each ant can modify in each iteration
into six subsections namely, a general description of proposed (T1) determined with a parameter named Max_Change. The
algorithm, Pheromone Initialization, Rule Construction, largest value of Max_Change is number of feature (In our
Quality Computation Function, Pheromone Update Rule, and experiments, Max_Change=2). The number of ants that
Stopping Conditions. modify the rule in inner loop of FADD is determined by user
(No_Of_Ants). The probability that each ant chooses termi,j to
A. A general description modify is
The FADD utilizes of the artificial ants in order to explore
the training search space and gradually make candidate rules.
The major difference of this algorithm with the previous
algorithms is that this algorithm learns rules for each class Where
separately. In other words, for each class such as k the main i,j : Is a problem-dependent heuristic value for termij. In this
function calls a function FADD, which this function learns the algorithm we use 0.5 for DC and 0.1 for other values.
i,j : The amount of pheromone currently available E. Pheromone Update Rule
(at time t) on the path between attribute i and value j. After each ant modifies the terms of a rule according to
a: The total number of attributes Max_Change parameter, pheromone updating is carried out.
bi: The total number of values in the domain of attribute i We have defined a new function to update pheromone, in such
I: Is the set of attributes that are not yet used by the ant a way that whenever each ant has modified the terms of rule
D. Quality Computation Function Rj, quality of rule Rj is calculated, if the quality of rule Rj is
increased then pheromone of this rule is increase according to
Whenever a rule modified by an ant, the quality function
value of quality that improved. We believe (by our
calculates the quality of modified rule. The quality of a rule
experiments) that with this new update Strategy, in each
such as Rj is computed according to equation (8).
iteration the pheromone helps improve the quality of rule.
(8) Pheromone updating is carried out according to equation (10).

Where
TP: true positives, the number of cases in our training set
covered by the rule that have the class predicted by the
rule. Where
FP: false positives, the number of cases covered by the rule
Q: show difference the quality of the rule after
that have a class different from the class predicted by
and before modification.
the rule
c: is a parameter to regulate influence of quality.
FN: false negatives, the number of cases that are not covered
by the rule but that have the class predicted by the rule.
It is necessary to decrease the pheromone of terms that have
TN: true negatives, the number of cases that are not covered
not participated in the construction of rules. For this purpose,
by the rule and that do not have the class predicted by the
rule. pheromone evaporation is simulated. To simulate the
phenomenon evaporation in real ant colony, the amount of
pheromone associated with each termij that does not occur in
Algorithm I:
the constructed rule must be decreased. The pheromone of
1. j=1, LearnedRules=[]; unused terms is decreased by dividing the amount of the value
2. While (Not satisfy stopping conditions) of each ij by the summation of all ij.
2.1 T=0;
2.2 Pheromone initialization; /* all of cells in the
pheromone table are initialized equally to F. Stopping Conditions
equation(5).*/ Stopping condition in outer loop of FADD function refers to
2.3 Create rule Rj; /* all of terms in this rule any condition that user has defined to terminate the loop. For
have DC value*/ example user can use the fix number of iterations or using the
2.4 Repeat minimum uncovered instances to terminate the FADD
2.4.1 T=T+1; function. In our experiments, we have used the combination of
2.4.2 Modify Rj according to these two conditions to terminate FADD function.
Max_change; /*Each ant can modify
the terms of rule Rj according to max_change
parameter */
2.4.3 Compute the quality of Rj ; V. EXPERIMENTAL RESULTS
/*according to equation(3).*/ Our experiments used data sets from the UCI data set
2.4.4 Update pheromone; /* according to repository [17]: the Pima Indian Diabetes, which contains 768
equation(5) */ instances, 8 integer-valued attributes and 2 classes. We
2.5 Until (T > No_Of_Ants) normalized the data sets, where each numerical value in the
2.6 If isProper(Rj) add Rj to LearnedRules; data set is normalized between 0.0 and 1.0. For this purpose,
/* Rj must Improve the classification rate */ the below function is applied to normalize the data set.
3. j=j+1;
4. End While; (11)
5. Return LearnedRule;
End Function FADD;
We evaluate comparative performance of FADD using ten-
fold cross-validation. Data set is divided into ten partitions,
Figure 2. A high description of FADD and FADD is run ten times, using a different partition as test
set each time, with the other nine as training set. The
classification rate being calculated according to equation (12)
(where the meanings of TP, TN, FN, FP are as in equation modifications that ants were did. With this new update
(8)). pheromone function ants in order to improve the quality of
rule, make better decisions in next iterations.
3. There are two important concepts in ACO that are:
(12) Competition and Cooperation. The previous versions of
Table I shows classification rate for the rule sets produced AntMiner paid more attention to Competition and this
by different algorithms. And table II shows the results of caused some of the rules was very strong while the other
FADD. It can be seen that proposed algorithm discovers less rules was nearly weak. In this paper we have paid attention
rules, but also it has the good classification rate, in comparison to cooperation in order to produce a set of nearly strong
with other methods. Also because of the number of rules that rules. For this propose, we have encouraged the ants to
FADD algorithm has produced and mean length of rules is have more cooperation in the body FADD function.
low, FADD has good comprehensibility.
REFERENCES
TABLE I: CLASSIFICATION RATE OBTAINED WITH DIFFERENT
CLASSIFIER [1] David Martens, Manu De Backer, Raf Haesen, Jan Vanthienen, Monique
Snoeck, and Bart Baesens, Classification With Ant Colony Optimization,
Method Classification Rate IEEE Trans on Evolutionary Computaion, Vol. 11, pp.651-656, 2007.
Decision Table* 71.224 [2] R. S. Parpinelli, H. S. Lopes, and A. A. Freitas, Data mining with an ant
RBF* 75.8 colony optimization algorithm, IEEE Trans on Evolutionary Computaion
vol.6, pp. 321332, 2002
NNGE* 73.5677 [3] Kemal Polat, Salih Gunes, Ahmet Arslan, A cascade learning system for
C4.5 Dta* 73.0 classication of diabetes disease: Generalized Discriminant Analysis and
Least Square Support Vector Machine, Expert Systems with Applications
Bayesa* 72.2
vol.34, pp.482487, 2008.
Regression Coefficients* 72.3958 [4] Hasan Temurtas , Nejat Yumusak , Feyzullah Temurtas, A comparative
Naive Bayes* 76.3021 study on diabetes disease diagnosis using neural networks, Expert Systems
with Applications vol.36 pp. 86108615, 2009.
CART* 72.8
[5] Mohammad Saniee Abadeh, Jafar Habibi, and Emad Soroush, Induction of
C4.5 rules* 67.0 fuzzy classification systems via evolutionary ACO-based Algorithms,
Deng et al[13] 78.4 International journal of simulation, systems, science, technology, VOL. 9,
NO.3, 2008.
Kayaer et al[14] 77.08 [6] Marco Dorigo, Christian Blum, Ant colony optimization theory: A survey,
Polat et al[3] 78.21 Theoretical Computer Science Vol.344, pp. 243 278, 2005.
Temurtas et al[4] 79.16 [7] M. Dorigo, V. Maniezzo, A. Colorni, The ant system: optimization by a
* The methods that is marked with asterisk, have been tested by software colony of cooperating agents, IEEE Transactions on Systems, Man and
Cybernetics, Vol.26, pp.1-13, 1996.
Weka [10].
[8] Urszula Boryczka, Finding groups in data: Cluster analysis with ants,
TABLE II. RESULT OF FADD Applied Soft Computing, Vol. 9, pp.6170, 2009.
[9] Christian Blum, Review Ant colony optimization: Introduction and recent
Number of Mean Classification Rate Mean length of rules trends, Physics of Life Reviews Vol.2, pp. 353373, 2005.
Rules [10] Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning
Tools and Techniques, vol.1, Morgan Kaufmann publications, pp. 363-483,
8 79.481.1 2.571 2005.
[11] American diabetes association. http://www.diabetes.org/diabetes-basics
(last accessed: November 2009)
[12] B. Liu, H. A. Abbass, and B. McKay, Classification rule discovery with
VI.
I. CONCLUSION ant colony optimization, In Proc. IEEE/WIC Int. Conf. Intell. Agent Technol,
2003.
This paper presents a mixture of Ant Colony Optimization [13] Deng, D., & Kasabov, On-line pattern analysis by evolving self-
and Fuzzy Logic for mining among Pima Indian diabetes data organizing maps, In Proceedings of the fifth biannual conference on artificial
set. Already, Ant Colony Optimization used for data mining to neural networks and expert systems, 2001.
classification [1,2,11,15]. The main new features of the [14] Kayaer, K., & Yldrm, T. Medical diagnosis on Pima Indian diabetes
using general regression neural networks. In Proceedings of the international
presented algorithm are as follows: conference on artificial neural networks and neural information processing,
1. Introducing a new framework for learning the rules in such 2003.
a way that the rules are learned for each class [15] B Liu, HA Abbass, B McKay, Density-based heuristic for rule discovery
independently. with ant-miner, The 6th Australia-Japan joint workshop on intelligent, 2002
[16] Mohammad Saniee Abadeh, Jafar Habibi, Emad Soroush, Induction of
2. A different strategy for controlling the influence of Fuzzy Classification Systems Using Evolutionary ACO-Based Algorithms,
pheromone values was studied. We proposed the new Proceedings of the First Asia International Conference on Modelling &
update pheromone rule that improves the quality of each Simulation (AMS'07), IEEE, 2007.
rule. Because for each rule, the value of pheromone that [17] Blake, C. L., & Merz, C. J. UCI Repository of Machine Learning
Databases, 1996, Available from http://www.ics.uci.edu./~mlearn/
increased in each iteration depend on the quality of MLReporsitory.html.

You might also like