Professional Documents
Culture Documents
Chapter-8
This thesis has addressed the problem of Spam E-mails. In this work a
Framework has been proposed. The proposed framework consists of the three pillars
which are Legislative measures, Behavioural measures and Technical measures.
These three pillars have equal importance to fight against the problem of Spam E-
mails. After studying Legislative, Behavioural and Technical measures important
conclusions are drawn. These conclusions are included to propose an effective
framework for Spam management.
This chapter consists of three sections which include conclusion and summary of
Legislative, Behavioural and Technological measures in which findings of each
measure are summarized. The last section of this chapter focuses on directions for
future research.
86
Chapter-8: Conclusion and Future Scope
Different countries are having different legislations with variety of options, the
method of investigation including prosecution are also varying in nature. This
variation will lead to situation where investigation process of one country will be
blocked by another country. So, there is need to have a homogeneous legislation on
Spam E-mail all over the World.
Lack of reporting mechanism. Only few countries have provided reporting
mechanism which are either online or offline. It is advisable that, each country
should establish at least one single online reporting mechanism using which
samples of Spam E-mails and incidents of Spamming can be reported. Only two
metro cities Mumbai and Bangalore in India, is having online mechanism for
reporting identity theft, which does not include Spamming.
The users should be aware of these reporting mechanisms as well as the provisions
of punishment made under Anti-Spam law for the effective implementation of it.
The reporting mechanisms should also provide appropriate information to the
victims regarding follow-up and action taken so far on the complaints registered by
them. The list of Spammers who have been punished for Spamming should be
published with wide publicity.
The reporting mechanisms would become a useful data collection tool, which can
be useful for Content based Filter to understand the current pattern of Spam E-
mails for the purpose of updating it.
87
Chapter-8: Conclusion and Future Scope
88
Chapter-8: Conclusion and Future Scope
After studying the content of Spam E-mail in behavioural measure, the process of
feature extraction is applied on the standard dataset Enron, LingSpam, PU123A and
PEM based on the pattern important features are extracted.
The Content based Filters with machine learning based classifiers and semantic
similarity with edge based classifier are implemented. The machine learning based
classifier including Decision Tree, Rough Sets, k-Nearest Neighbor (k-NN) and
Support Vector Machine (SVM) are implemented. The Rough Set classifier is
implemented with various rule generation methods such as Genetic Algorithm, Learn
by Example Method (LEM), Covering Algorithm, and Exhaustive Algorithm. The
SVM is implemented with various kernel functions like Linear Kernel, Multi Layer
Perceptrons, Quadratic Functions, Radial Basis Function. These classifiers are
executed on the extracted features of standard dataset Enron, LingSpam, PU123A,
Spambase and on PEM.
The frequency of occurrences is the meaningful attribute added to the features which
are extracted and it has contributed for improvement in results.
The overall performance of SVM using polynomial kernel is high for PU123A,
LingSpam and PEM datasets. In the polynomial kernel the degree of polynomial is
three and classification categories are two (such as Spam and Ham). The hyper plane
formed using SVM Polynomial does binary classification since, input data is linearly
separable, therefore the results achieved are promising.
During empirical analysis it is found that, accurate feature extraction has reduced the
gap between low level features and high level feature of an E-mail. Thus, the accuracy
of Spam filtering is improved and Spam misclassification is reduced.
The empirical analysis of ML based classifiers shows that, the Naive Bayesian
classifier is suitable classifier for the dataset like Enron while, Rough Set with
Genetic Algorithm is suitable for the dataset Spambase. The SVM with polynomial
kernel outperforms on dataset like LingSpam, PU123A and PEM. The experimental
results show that, the ML based classifier is both effective and efficient Anti-Spam
filter.
The Content based Filter using semantic similarity with edge based classifier is
implemented with the intent to improve the results of machine learning classifier. The
results are compared with machine learning based classifier. Table-7.8 shows that,
semantic similarity with edge based approach outperforms ML based classifiers with
misclassification such as false positive and false negative is almost zero.
89
Chapter-8: Conclusion and Future Scope
The Content based Filter is made adaptive in nature to improve the accuracy of filter
during the course of time.
It has proved that, semantic relationship specifically synonyms plays an important
role in Spam classification. The semantic similarity with edge based classifier has
advantage that, it do not depend on the corpora. The experimental results outperform
previous machine learning based classifiers also it has reduced the misclassification.
The overall analysis shows that, Naive Bayesian, SVM with Polynomial Kernel and
Semantic Similarity with Edge based approach classifier are promising techniques
that can be applied to fight against the problem of Spam E-mails.
Finally, the combination of Origin based Filter with Content based Filter would
produce the optimal results The results clearly demonstrate that, the proposed Anti-
Spam Framework can effectively filter the Spam E-mail with very less
misclassification (as 100 % classification is impossible) since, it is adaptive in nature.
90