You are on page 1of 5

Chapter-8: Conclusion and Future Scope

Chapter-8

Conclusion and Future Scope


___________________________________________________________

This thesis has addressed the problem of Spam E-mails. In this work a
Framework has been proposed. The proposed framework consists of the three pillars
which are Legislative measures, Behavioural measures and Technical measures.
These three pillars have equal importance to fight against the problem of Spam E-
mails. After studying Legislative, Behavioural and Technical measures important
conclusions are drawn. These conclusions are included to propose an effective
framework for Spam management.
This chapter consists of three sections which include conclusion and summary of
Legislative, Behavioural and Technological measures in which findings of each
measure are summarized. The last section of this chapter focuses on directions for
future research.

8.1 Legislative Measures


The study of legislative measures is carried out which consist of study of
current legislative mechanism implemented all over the world to fight against the
problem of Spam E-mails. The parameters such as type of subscription, scope of the
subscription, the type of sender as well as receiver and group of possible accusers are
considered for this study.
In India, no Anti-Spam law and general ID theft laws are implemented but, relevant
provisions have been made in the criminal law, which includes the reporting
regarding identity theft and related issues. For addressing cyber security and privacy
issues several amendments have been made to the Information Technology Act 2000
(IT ACT 2000) which was notified on 17th October, 2000 by the Indian Parliament.
In India it is need to have separate Anti-Spam legislation.
The summary of study carried out on Legislative measures is as follows:-
 It is found that, only few countries have enacted on Spam legislation which also
includes identity theft legislation. Traditional provisions are also made including
fraud, forgery, and cybercrime. In India it is need to have separate Anti-Spam
legislation.

86
Chapter-8: Conclusion and Future Scope

 Different countries are having different legislations with variety of options, the
method of investigation including prosecution are also varying in nature. This
variation will lead to situation where investigation process of one country will be
blocked by another country. So, there is need to have a homogeneous legislation on
Spam E-mail all over the World.
 Lack of reporting mechanism. Only few countries have provided reporting
mechanism which are either online or offline. It is advisable that, each country
should establish at least one single online reporting mechanism using which
samples of Spam E-mails and incidents of Spamming can be reported. Only two
metro cities Mumbai and Bangalore in India, is having online mechanism for
reporting identity theft, which does not include Spamming.
 The users should be aware of these reporting mechanisms as well as the provisions
of punishment made under Anti-Spam law for the effective implementation of it.
 The reporting mechanisms should also provide appropriate information to the
victims regarding follow-up and action taken so far on the complaints registered by
them. The list of Spammers who have been punished for Spamming should be
published with wide publicity.
 The reporting mechanisms would become a useful data collection tool, which can
be useful for Content based Filter to understand the current pattern of Spam E-
mails for the purpose of updating it.

8.2 Behavioural Measures


The study of behavioural measure is carried out with the objective to find out
behavioural pattern which may be common in sending Spam E-mails. This pattern
found to be useful to set a foundation for technological measures for proposing an
Anti-Spam solution. The study of E-mail delivery pattern is carried out. The content
analysis of header part and body part has been carried out. The content analysis
carried out which has played an important role for the Content based Filter proposed
in technological measures.
The summary of behavioural study which is mentioned below is used and found very
useful while suggesting the technological solution to block Spam E-mail.
 The E-mails which contains almost all words of ‘subject’ field or body of an E-
mail or both are in uppercase, then it is definitely Spam E-mails.

87
Chapter-8: Conclusion and Future Scope

 The E-mail which do have ‘subject’ field empty is definitely Spam.


 The E-mail which has different domain names in ‘From’ field and ‘Reply-to’ field
is Spam.
 Some Spam E-mails contains many E-mail addresses in ‘To’ field. Presence of
many E-mail addresses in ‘To’ field shows that it is Spam.
 Many Spam E-mails does not contain E-mail address at ‘To’ field generally, it is
added to ‘CC’ or ‘BCC’ field.
 The Spam E-mail does not contain information in the field ‘Return-path’.
 It is also observed that, some E-mails which has typical words or combination of
these in the ‘From’ field such as ‘NOKIA MOBILE LOTTREY DRAW’, ‘Promo
Enlargement’, ‘BBC NATIONAL LOTTERY’,’UNITED KINGDOM
LOTTERY’, ‘COCA COLA DRAW’, ’Free Trial Men’s Supplement’ are Spam.
 During this behavioural study, some words are identified presence or combination
of these words increases the chances of E-mail being Spam. These words are
‘WON POUNDS’, ‘job offers’, ‘UK-LOTTERY’, ‘huge stick’, ‘increase your
length’, ‘desired proportion and size’, ‘Customer Survey’, ‘WON 500,000GBP’,
‘LOAN OFFER!!’,’WINNING NOTIFICATION..!!’, ‘making money’, ‘income
going down’, ‘LOTTERY DRAW’, ‘Weight Loss’ ,‘Diet’, ‘WON £750,000.GBP’,
‘SEX PILL’, ‘Buy Viagra at Half Price’, ’Winner’, ’MyDailyFlog!’, ‘HasDonated
(£,,500,000.GBP)’ etc.
 Some Spammers intentionally break-up the words or misspelling the Spam words
in order to bypass filtering mechanisms.

8.3 Technological Measures


In order to propose technological solution to block the Spam E-mail, initially
existing solutions are implemented. The Anti-Spam Framework has been proposed
which consists of combination of Origin based Filters with Content based Filters.
The Origin based Filter such as White-list and Black-list are implemented. The
Challenge Response System which is used to differentiate between human and
machines is implemented The drawback of C-R System are solved by proposing the
architectures.

88
Chapter-8: Conclusion and Future Scope

After studying the content of Spam E-mail in behavioural measure, the process of
feature extraction is applied on the standard dataset Enron, LingSpam, PU123A and
PEM based on the pattern important features are extracted.
The Content based Filters with machine learning based classifiers and semantic
similarity with edge based classifier are implemented. The machine learning based
classifier including Decision Tree, Rough Sets, k-Nearest Neighbor (k-NN) and
Support Vector Machine (SVM) are implemented. The Rough Set classifier is
implemented with various rule generation methods such as Genetic Algorithm, Learn
by Example Method (LEM), Covering Algorithm, and Exhaustive Algorithm. The
SVM is implemented with various kernel functions like Linear Kernel, Multi Layer
Perceptrons, Quadratic Functions, Radial Basis Function. These classifiers are
executed on the extracted features of standard dataset Enron, LingSpam, PU123A,
Spambase and on PEM.
The frequency of occurrences is the meaningful attribute added to the features which
are extracted and it has contributed for improvement in results.
The overall performance of SVM using polynomial kernel is high for PU123A,
LingSpam and PEM datasets. In the polynomial kernel the degree of polynomial is
three and classification categories are two (such as Spam and Ham). The hyper plane
formed using SVM Polynomial does binary classification since, input data is linearly
separable, therefore the results achieved are promising.
During empirical analysis it is found that, accurate feature extraction has reduced the
gap between low level features and high level feature of an E-mail. Thus, the accuracy
of Spam filtering is improved and Spam misclassification is reduced.
The empirical analysis of ML based classifiers shows that, the Naive Bayesian
classifier is suitable classifier for the dataset like Enron while, Rough Set with
Genetic Algorithm is suitable for the dataset Spambase. The SVM with polynomial
kernel outperforms on dataset like LingSpam, PU123A and PEM. The experimental
results show that, the ML based classifier is both effective and efficient Anti-Spam
filter.
The Content based Filter using semantic similarity with edge based classifier is
implemented with the intent to improve the results of machine learning classifier. The
results are compared with machine learning based classifier. Table-7.8 shows that,
semantic similarity with edge based approach outperforms ML based classifiers with
misclassification such as false positive and false negative is almost zero.
89
Chapter-8: Conclusion and Future Scope

The Content based Filter is made adaptive in nature to improve the accuracy of filter
during the course of time.
It has proved that, semantic relationship specifically synonyms plays an important
role in Spam classification. The semantic similarity with edge based classifier has
advantage that, it do not depend on the corpora. The experimental results outperform
previous machine learning based classifiers also it has reduced the misclassification.
The overall analysis shows that, Naive Bayesian, SVM with Polynomial Kernel and
Semantic Similarity with Edge based approach classifier are promising techniques
that can be applied to fight against the problem of Spam E-mails.
Finally, the combination of Origin based Filter with Content based Filter would
produce the optimal results The results clearly demonstrate that, the proposed Anti-
Spam Framework can effectively filter the Spam E-mail with very less
misclassification (as 100 % classification is impossible) since, it is adaptive in nature.

8.4 Future Scope


Though, thesis has made efforts towards solving the problem of Spam E-mail
using legislative, behavioural and technological measures, the solution proposed are
not complete solutions. The problem of Spam E-mail and Anti-Spam solution is game
of cat and mouse since, every day Spammer will come up with new techniques of
sending Spam E-mails. This work has given the potential direction for classification
of the Spam E-mails.
The future efforts would be extended towards:
 Achieving accurate classification, with zero percent (0%) misclassification of Ham
E-mail as Spam and Spam E-mail as Ham.
 The efforts would be applied to block Phishing E-mails, which carries the phishing
attacks and now-days which is more matter of concern.
 Also, the work can be extended to keep away the Denial of Service attack (DoS)
which has now, emerged in Distributed fashion called as Distributed Denial of
Service Attack (DDoS).

90

You might also like