You are on page 1of 4

Proceedings of the 2007 IEEE

International Conference on Integration Technology


March 20 - 24, 2007, Shenzhen, China

The Data Mining of the E-Government on the Basis oQ


Fuzzy Logic
Yilei Wang and Hui Pan

Tao Li

Department of &RPSXWHU science & Technology

The Network Center

University of /X Dong

University of /X Dong

Yantai, Shandong Province, China

Yantai, Shandong Province, China

ZDQJ_yilei2000@.com

/LWDRB@VLQDFRPFQ

uses the technology of E-Government. In 1993,the


National Performance Review Committee (NPRC) set up
by Clinton, and he proposed to apply advanced
information network technology in order to overcomes
malpractice existing in the American government
management and service aspect.
E-Government includes three types[2].
G2C: that includes all the interactions between a
government and its citizens. Its major activities areas
include tourism and recreation, research and education,
downloadable forms, discovery of government services,
information about public policy and advice about health and
safety issues.
G2B: Its major activities areas include Government
E-Procurement, Group Purchasing, Forward E-auctions and
Tax Collection and Management.
G2G: Its major activities areas include Inter-link,
Procurement at GSA, Federal Case Registry and
Procurement Marketing and Access Network.
The Transformation Process:
Stage 1: Information publishing dissemination.
Stage 2: Official two-way transactions with one
department at a time.
Stage 3: Multipurpose portals.
Stage 4: Portal personalization.
Stage 5: Clustering of common services.
Stage 6: Full integration and enterprise transformation.
The E-government is a systems engineering and an integrity
system should include the following functions: (long-distance,
distributing) information collection, information management
(electronic files, record management), information security;
electronic office; electronic post, electronic document and
decision system of the government and the report forms;
public web. The structure of the e-government is fig.1.

Abstract - The technology of data mining is widely used in


various fields. E-Government is a grand new domain in recent
years. When we use the E-Government system to process data,
we need to choose what data is useful and what kind of new
information we can get from the log file or from the database.
Because of the special characters of knowledge, this paper
presents an algorithm of the fuzzy data mining, and put great
importance on the steps of the Fuzzy data mining. At the end
of this paper it improves the values of Fuzzy data mining in
the E-government by a real example.
Index Terms 'DWD mining; E-government; Fuzzy logic

I. INTRODUCTION
Generally, data mining (sometimes called data or
knowledge discovery) is the process of analyzing data from
different perspectives and summarizing it into useful
information that can be used to increase revenue, cuts costs,
or both. It allows users to analyze data from many different
dimensions or angles, categorize it, and summarize the
relationships. Technically, data mining is the process of
finding correlations or patterns among dozens of fields in
large relational databases. But the pure data mining may
induce some problems[1] that cant be solved in the scope of
the traditional mathematics. So in this paper, the fuzzy set
emerges as the solution of these problems.
II. THE CONCEPT OF E-GOVERNMENT

The electronic government affairs are the


government apparatus applying modern information and
the communication, they will manage and carries the
service on the integration through the network
technology, they will realize the official organization
structure and the work flow optimized reorganization on
Internet, between the surmounting time and spatial and
department's separation limit, they will provide high
quality and omni-directional, the standard to the society
but is transparent, conforms to the international standard
management and the service. In other words, it is the use
of IT and e-commerce to provide access to government
information and delivery of public services to citizens
and business partners. America is the first country that

1-4244-1092-4/07/$25.00 2007IEEE.

Fig.1 The structure of the e-government

774

Today, e-government is associated with the Internet.


However, governments have been using other networks,
especially internal ones, to improve government operations
for over 15 years [3]. The electron government system of
Our country begins to take shape, entering the 21st century,
the Chinese network construction crossing various
departments obtains fast development, and the specialized
government service website increases day by day. In April
1998, Tsingdao has established the first strict significance
government website on the Internet "the Tsingdao
government affairs information public network". In January
1999, more than 40 information department responsible for
the work proposed together initiated "the government online
project", in May 1999, the registers under gov.cn
increases suddenly to 1,470, by the day December 31
2002, take gov.cn as the ending registration domain
name totally achieved 7,796.It accounts for the domestic
domain 4.3%, compared with the beginning of 2000 with
39 percentage points[4].
Statistics have indicated, the nation had 2,200 all
levels of government departments to establish his own
website, the service content has been richer and richer,
the function unceasingly strengthens, the interaction also
obtains great enhancement[5]. However facing such
complex magnanimous data, we needs to use the new
technical method to come to these data to carry on the
analysis, enables its "to recycle waste". The data mining
technology can solve this problem.

system use the price automatic forecasting function. There


many methods to realize the automatic forecasting function
in the data mining such as time serial, regression,
decision-making trees, NN network, rough set arithmetic
and the inherit arithmetic etc.
Today, companies with strong consumer focus retail,
financial, communication, and marketing organizations
primarily use data mining. It enables these companies to
determine relationships among "internal" factors such as
price, product positioning, or staff skills, and "external"
factors such as economic indicators, competition, and
customer demographics. And, it enables them to determine
the impact on sales, customer satisfaction, and corporate
profits. Finally, it enables them to "drill down" into
summary information to view detail transactional data. As
mentioned at the beginning of the paper, there are often
some fuzzy concepts in the E-Government system, so we
adopt the fuzzy data mining to solve the problems. 
In 1965 the American cybernetics expert L. A. Zadeh
issued the first thesis about the fuzzy theory on the
Information and Control's magazine[3]. The production of
fuzzy theory gets extensive application in the fields of
mathematics domain. At the beginning of the forties of the
20th century, the development of computer science and
fuzzy theory has been related; people progressively use
fuzzy theories[4] to solve complicated problems. Data
mining is a kind of technology that finds the ensconce
information from large-scale database or in data warehouse.
At present, its main purpose is to help the policymaker to
look for potential relationship among the data, and find the
neglected key element, and make decision or predict
automatically. But there are many fuzzy data in our daily
life and the traditional data mining method cant fit for
these data, so a new technology fuzzy data mining[5]
emerges. The principle of this theory is to use fuzzy set in
data mining. The steps of data mining based on the fuzzy
[6]
theory are as the following :

III. THE DATA MINING TECHNOLOGY

The data mining technology is a complete process, this


process excavates the information from the large-scale
database, which is formerly unknown, effective, useful
information, and used this information to make decision or
rich knowledge. This technology faces the application from
the very beginning, now it is used widely in bank,
telecommunication, transportation, retail sales (for example
supermarket) and so on. In sense of cognition science, data
mining mostly uses greatly induction to discover knowledge
while uses deduction when appraising the discovered
knowledge, thus the algorithm of the data mining combines
induction with deduction. Following parts usually composes
the data mining:
(1) Controller-controlling other devices operation;
(2) Database Interface-creating and processing
database inquiry;
(3) Knowledge base-storing special information of
fields;
(4) Focus-deciding assignment of analysis data;
(5) Pattern extraction-choosing algorithms of pattern
extraction;
(6) Evaluation-evaluating whether extracted pattern is
interesting and effective.
The main function of data mining includes automatic
forecasting the trend and behavior function, relationship
analysis function and cluster analysis function, and this

$. Determine to classify targets and collect the factor


data
In all data recording of the data warehouse, at first we
set up a categorized sample collection X(x1, x2,, xn),
collecting the numbers of sample to bring sample
two-dimensional data lists that classify write down number.
To every sample, there is a sample index. There is m a
sample index, available m links vector and express samples
i, xi=(xi1, xi2,xin). Because the data that gather in reality
are not here [0,1] in the number block, and do not accord
with the demands for fuzzy set, so it's time to standardize it
first, and then makes mach index of making samples
concentrate each sample is here [0,1].

%. Set up fuzzy similar relation


Let
and

775

xj

rij

[0,1] score the relevant degree between

xi

~ r
r r r
, and let R =( ij )n x n x n x n ij = ji , ij =1(i,

and are the representation of the outside accumulating


and the inside accumulating of fuzzy operation.
According to selecting the principle of nearing:
(X Mod ei)=max ((X Mod e1), (X Mod e2) (X Mod
es)) judge which mode this sample is close to, and then
predict its development result from the whole situation of
this mode.

j=1,2n). The key problem is how to set up the R . The


commonly used methods are the accumulates method,
coefficient correlation law method, greatest minimum law
method, minimum law of arithmetic average method, the
minimum law of geometric average method, index law of
absolute value method etc. Here we introduce the
accumulates method:
1
i=j

rij

1 m
xik xki
M k=1

IV. THE APPLYING OF THE FUZZY DATA MINING ON THE


E-GOVERNMENT

i<>j

The content of the E-Government is abroad and its applying


technology is much more than others. As noticed in above, the
E-Government
includes
three
kinds:
Government-to-Government, Government to Business and
Government to Citizen. In this paper the technology is mainly
applied on the G2B aspect. G2B involves some contents such
as the electron-stock, electron-duty and information-service
etc. The electron-duty means that you can finish duty register,
duty declare tax transfer and revenue query at home. This
system can not only offer convenience to the enterprise but
also decrease the expense of the government. However this
sounds good in thought, in realities, there are so many data
that you cannot distinguish what is useful to you what is no.
How to distil useful information from the data house is urgent
problem that need to solve.
In a tax data house of a Shan Dong terra tax bureau, there is
a datasheet as table1.

M>0

M>=

max( xik x jk )
i j

k =1

C. Cluster analysis
The cluster analysis has three methods: equivalence close
bag law, most great number method and weave network law,
the most commonly used is the biggest tree. This method is
utilized when n is very big. When the work load is
presented under the state that the index multiple increases,
it makes use of fuzzy matrix to carry on a kind of method of
the cluster directly, and the concrete measure is the
following:
(1) For the summit pinnacle according to the target that is
classified, when
(2) Let

rij

rij

<>0,xi and xj can link a side

permutation from small to largea1>a2>> al,


TABLE 1

and ak(k=1,2l) is a certain ij


(3) Link the objects which relational degree are a1and
indicate a1 on the corresponding line segment, if the
loop appears while joining some two targets , this line
will not be drawn.;
(4) To a2al in proper order, repeats the measure 3, until
all targets feed through, and then we can get the
greatest tree at this moment, but this biggest tree is not
the only one.

DATASHEET OF A TAX DATA WAREHOUSE

ID
001
002
003
004
005

department
5
3
5
5
4

economy
3
4
2
3
5

vocation
2
5
3
1
3

The time granularity is divided into three layers: year,


season and month. The department granularity is divided
into four layers: province, terra, country and town. The
economy granularity is divided into two layers: foreign
capital and national capital. The vocation granularity is
divided into two layers: Metal industry and metal
manufacturing industry. In practice applying, we often fall
across such question: the real tax of a certain time, a certain
department, a certain economy and a certain vocation
belongs to what kinds of levels.
A certain time, a certain department, a certain economy
and a certain vocation are some known data if we have a
datasheet. The real tax is a fuzzy variable and we should use
the technology of fuzzy data mining to get the number of it.

(5) Let [0,1] , cut the line which upper value is smaller
than the line segment, and it have left what has been
joined there is targets belong to every one under level.

'. Forecast
To every mode that is received during cluster analysis,
try to achieve the average index of this mode according to
the lower type:

Time
5
2
5
1
2

/p

ki
i=1,2,s;j=1,2m
Mod eij=
S shows that all modes are counted, k shows that this mode
has in data warehouses several records are put out, P show
that introduces the total amount of records of this mode.
The sample waiting to be predicted Y(y1, y2,yn) is
N a fuzzy sub collection in talking about land X of sample
and compares with mode which data classify in the
warehouse and ask and publish their pressing close to
degree:
(X, Mod ei)=(1/2)[XMod ei+(1-X Mod ei)

According to the former arithmetic we can get the R :

776

CONCLUSION

0.8 0.5 0.5


1 0.4
0.4 1 0.4 0.4 0.4
~

R =
0.8 0.4 . 1 0.5 0.5
0.5 0.4 0.5 1 0.6

0.5 0.4 0.5 0.6 1


We adopt the max-tree and we get different value if we
assume different value of . The process is as the fig.
2.According to the experiment result, when [0.5,0.6] ,
the five record in the datasheet can be divided into three
glass. This classify is reasonable according with other
classification.

The technology of data mining based on the data


warehouse is a new emerging decision-making analysis
method. It can distill concealed, latent and unknown useful
information or the pattern from the mass data in order to
assist the policy-maker to carry on the decision-making.
This article uses the fuzzy data mining to excavate the
useful information in the electronic government affairs
system, thus the help government policy maker makes the
right decision-making.
REFERENCES
[1] Luo J. Integrating Fuzzy Logic with Data Mining Methods for Intrucsion
Detection: [MSThesis]. Mississippi State University,19991
[2] Balutis,AP. (2001). E-Government 2001,Part I: Understanding the
Challenge and Evolving Strategies, The public manager, Spring, 33-37.
[3] Zhang Junhua.Will the government 'serve the people? The development
of Chinese e-government. " New Media & Society 4.2 (June): 163-184,
2002.
[4] Y.C.Hu,R.S.Chen and G.H Tzeng, Generating ;learning sequences for
decision makers though data mining and competence set
expansion:,IEEEE Transactions on Systems,Man,and Cybernetics (2002)
(to appear)
[5] K.Atanasov,More on intuitionistic fuzzy sets, fuzzy sets and systems
33(1989)37-46
[6] D. H. Hong. A note on correlation of interval-valued intuitionistic fuzzy
sets, Fuzzy Sets and Systems 95 (1998):113-117

Fig.2 The process of the max-tree

From the fig.2(b), we know that 1 and 3 belong to the same


level while 4 and 5 belong to the same level.2 belongs to the
other level. The last step is to predict. According to the step
3.5 we get the inner product, the outer product and fuzzy
near-tude of the A, B, C node of the data. We can know which
tax money level this data does belong to by Looking, which
kind of this empirical datum does draw close to, and most
approaches.

777

You might also like