You are on page 1of 64

1.

ABSTRACT

2. INTRODUCTION

3. ABOUT ORGANISATION

4. SRS DOCUMENT

5. DESIGN PRINCIPLES & EXPLANATION

6. DESIGN DOCUMENT

6.1 SYSTEM DESIGN


7. PROJECT DICTIONARY

7.1 UML DIAGRAMS


8. FORMS & REPORTS

8.1 I/O SPECIMENS


8.2 I/O SAMPLES

9. TESTING

9.1 TEST CRITERIA & TEST CASES


9.2 TEST REPORT & ANALYSIS

10. IMPLEMENTATION & USER MANUALS

11. CONCLUSION

12. BIBLIOGRAPHY
Abstract:
A Filter is the core of the decision making in Privacy-Aware Collaborative
Spam Filtering document. A filter decides on a per-mail basis whether the
message should be downloaded or not. A pipeline of filters is setup (yes,
again setup in the configuration) and a message, which needs to be
downloaded, is passed through this pipeline. At any point of the pipeline, a
filter could indicate that the message should not be processed through the
pipeline anymore. For example a filter (sender based) could find a match
from the list of spammers it has and reject the message.

There are two kinds of filters -- global and local. These are not an attribute of
a filter itself, but rather depend on the usage of a filter. Local filters are
associated to a maildrop whereas global filters are applicable to all maildrops.
For example, you might want a Message-ID filter to be applicable to all
maildrops whereas keep a sender-based filter only for the maildrop where
you expect mail from that sender.

A filter has the single job of deciding whether or not to download a single
message. The actual decision of whether to download a mail or not is made
through a sequence of filters. There can be a global set of filters as well as
per maildrop one. A maildrop represents your mailbox from which you want
to download your mail.

In this project we define totally six filters. But we can define more than that
number of filters as our requirements. In this project our focus is on main
and basic filters like HeaderMailFilter, MessageIDMailFilter, NullFilter,
ReceipientMailFilter, SenderMailFilter, SizeMailFilter and SubjectMailFilter.
A project titled, “Privacy-Aware Collaborative Spam Filtering
document” is proposed to be developed with WINDOWS-2000 Server as the
operating system and Java Mail API of J2EE Technologies. This package will
have provision for creating your own filters and using those generated filters
in appropriate places.

System Analysis:
Existing System:

The existing system is not computerized. All the mails were being done
manually. To make this laborious job simple it is to be computerized.

The administrator maintains all the mailboxes of employees of our


organization. He is the responsible to organize those mailboxes. But incase of
deleting unwanted mails he deletes them manually by marking after checking
which mails are unwanted based on some facts like large sized mail, userID
and so on.

Proposed System:

The first step of analysis process involves the identification of need. The
success of a system depends largely on how accurately a problem is defined,
thoroughly investigated and properly carried out through the choice of
solution.

This package has been developed in order to overcome the difficulties


encountered while using the manual system. Faster and timely deletion of
mails which are unwanted is another motivating factor for the development
of this package.

Project Scope and Objectives:

 Privacy-Aware Collaborative Spam Filtering document is a tool to delete


unwanted mails. Lot of effort was put
to make it user friendly.
 Optimum utilization of tool is possible. All basic filters are provided.
 Reduces the user interaction work.
 The wastage of time is reduced.
 It also helps in optimum distribution of funds by the management among
user groups for procurement of new equipment.
 More flexible it means user (administrator)can add his own number of
filters if he interested easily.

Company Profile:
Global Interactive Solutions has emerged to be a world-class solutions and
products organization with clientele spread across geographies. It has time-
and-again taken up challenges for accomplishing the mission of customer
satisfaction armored with a focused vision and technical expertise.

Our growth and success has evolved from our ability to foresee customer
challenges and address them with apt solutions. Our teams, comprising of
research innovators, architects and developers have constantly worked on
developing products, solutions and mission-critical applications.

We started with Visual SHIFT, our initial product that addressed the Y2K
problem. It received global acclamation and was awarded "Product of the
Year" by Datamation under Y2K product category. Gartner Group, the
research and consulting organization, rated Visual SHIFT as "Best in Class".
It also won the accolades of being the "Best Product" from HYSEA
(Hyderabad Software Exporters Association).

Global Interactive Solutions Technologies is a global software development


firm specializing in software testing and product development services
catering to technology companies across diverse industry segments.
Our flexible delivery model helps us offer focused IT solutions, which help our
clients respond quickly to their business opportunities. Our clients engage
with us to enable them stay ahead in the technology adoption curve and to
develop and protect their Intellectual Property Assets. Our Technology
Excellence Groups embrace new technologies as they emerge to provide
clients with solutions that give them a competitive edge in their businesses.
Leveraging our strengths in Research & Development and expertise in
Component Based Application Development, we have been successfully
providing our global clientele with software testing and product development
services. With such technology foresight and sophisticated product
development and testing expertise, we credit our success to commitment,
performance, delivery and customer delight.

Our world-class practices and methodologies make us the preferred


technology partner for many technology companies. Our growth comes from
the unique business model and integration of people, processes and
technology. We have continually demonstrated our commitment to develop

Cost-effective, quality products and custom-applications built on strict time-


lines by adopting industry standard processes.

People, experience and skill sets are the ultimate competitive differentiators
when it comes to finalizing a Strategic Offshore Outsourcing deal. Global
Interactive Solutions is an IT services company that adapts solutions to the
market requirements. Its people are well qualified and experienced in the
technology platforms they work. Personnel are trained and retrained, that
make them as masters in the chosen area.

Consolidating our capabilities in diverse technologies, and our solid


foundation in product and application development, we built expertise in
delivering end-to-end solutions and providing Enterprise Application
Integration.
Our technical expertise coupled with functional know-how equipped us to
collaborate with global organizations to deliver enterprise-wide solutions for
business verticals such as Insurance, Retail and Distribution, Consumer
Electronics, Healthcare and Utilities. Our clientele comprise of organizations
of varied sizes - from small and medium companies to Fortune 100
corporations. We act as strategic technology partners for global
conglomerates and also provide R&D outsourcing services to international
technology labs.

Requirements Specification Document

The Privacy-Aware Collaborative Spam Filtering document is developed with


the aim of automatically deleting the unwanted mails based on our
definitions from the specified maildrops. The Privacy-Aware Collaborative
Spam Filtering document takes all the necessary definitions, in which we
define some facts based on those mails are deleted automatically. The
administrator can define those facts to delete the unwanted mails.
1. Introduction

1.1 Purpose: The purpose of this document is to


describe all external requirements for the Privacy-Aware
Collaborative Spam Filtering document. It also describes
the interfaces for the system. It is

a. To implement Privacy-Aware Collaborative Spam


Filtering document we need a mail Server, which is
capable of storing mail in corresponding mailboxes. In
our project we implement or tested our filters on James
server as it is available open.

b. As a user interface we used Microsoft Outlook Express.


Because it user-friendly and easy to access, read and
maintain our mails.
c. To send mails we need a protocol capable to send or
deliver mails. And for receiving mails we need another
protocol to get those mails from our mailboxes. In our
project we used SMTP for sending the mails and POP3
for receiving the mails. These both are available in single
mail server i.e. our James mail server we used.

d. Using XML language and basic java we can write script or


code for filters. Because XML provides application
interoperability.

1.2 Scope: This document describes the requirements of the


system. It is meant for use by the developers, and will also
be the basis for validating the final system. Any changes
made to the requirements in the future will have to go
through a formal change approval process. The developer is
responsible for asking for clarifications. When necessary and
will not make any alterations without the permission of
client.

This project work intends to delete the not required mails


from the mailboxes of organization personnel. In this lot of
effort was put to make it perfect. Work Load to delete mails
was avoided. The time for processing and deleting mails is
considerably reduced. It helps a lot to the administrator by
saving his valuable time. Thus he can allot that for other
important activities. It provides more extendibility. Besides
the existing filters Administrator can add his own filters if
needed in future easily. We can apply these filters on any
other Mail Severs to drop unwanted mails from specified
maildrops. Administrator has two options to delete the
mails: one is to run the filters manually whenever he
wanted. Other one is he can set those filters to run
automatically on schedule base.

1.3 Definition: A Filter is the core of the decision making in


Privacy-Aware Collaborative Spam Filtering document. A
filter decides on a per-mail basis whether the message
should be downloaded or not.
1.4 Reference: Not Applicable.
1.5 Developers Responsibilities overview: The points
that mentioned in system requirements specification are
1. An introductory nature describing mainly the

 Purpose of the system requirements specifications


document.
 Outlining the scope of the envisaged application.

2. Describes the iterations of the system with its


environment without going into the internals of the
system. Also describes the constraints imposed on the
system. Thus it is out side the envisaged application. The
assumptions made are also listed. It is supported by the
 UML Diagrams

3. It also describes the internal behaviour of the system in


response to the inputs and while generating the outputs.

This document is also supported with detailed level UML


diagrams, list of inputs, process explanation and list of
output.

4. Contains external interface requirements, which includes


the user, hardware and software interfaces.

5. Deals with performance requirements of the system.


Contains the design constraints composing of software
constraints and hardware constraints.

1.6 Product function’s overview: In the Organization every


employee has a mailbox. To this mailbox any one can send any number of
mails for that mailbox owner. Some times we are suffering from spam mails,
lengthy mails which may occupy all the memory allotted for our mail box and
so on. These kind of mails are controlled by our company administrator as he
is the responsible to manage all these mailboxes. He can set some
constraints on those mailboxes like drop these kinds of mails if any. Here
those constraints are nothing but our filters. By embedding these filters in
company’s mail server he can restrict the mails. There no need to delete the
mails manually after marking the delete mails. In this project administrator
has to run those filters on specified mailboxes manually when ever he wants.
There is one more option that is he can set those filters to run periodically
without taking the permission from administrator.

When ever you run these filters they simply apply the
logic we have written it already in a java file on every mail in
a all mailboxes or specified mailboxes. Based on this logic
decides whether to down load the mail or not. This
functionality automates the function of deleting the mails.

1.7 User characteristics: In our project user is an


administrator. He must have the knowledge of how to
implement or embed these filters on MailServer.

1.8 General constraints: The system should run on Pentium,


under windowsNT/2000 professional or server or forward
versions of Microsoft operating systems with minimum 16
MB RAM for better performance. Actually these filters can
apply on any kind of Mail servers.

1.9 Assumptions and Dependencies:

a. It is assumed that the James is real Mail Server resource


and required information already existed with the
system.
b. It is assumed that mail client is Micro
Soft Outlook Express or Netscape Communicator.
c. All the details produced by the user are correct.
d. User will ask for new filters when he wants to filter
mails more deeply or any situation, to filter like this
come.

2. Function Requirements

Functional requirements specify which outputs should be


produced from the given inputs. They describe the relationship
between the input and output of the system, for each functional
requirement a detailed description of all data inputs and their
source and the range of valid inputs must be specified.
All the operations to be performed on the input data to obtain
the output should be specified.

2.1 Inputs:

1. Null Filter: It deletes all kind of mails irrespective of


characteristics of mails. This filter consumes all
messages. It also marks them for deletion

2. Header Mail Filter: Matches a header in the


message. This requires the name of the header and
value of the header.

3. MessageID Mail Filter: Filters messages if they


contain a duplicate Message-id. This Filter stores the
list of downloaded message-ids in the specified file

4. Recipient Mail Filter: This filter matches the


recipients of the message against those provided in a
list.

5. Sender Mail Filter: This filter matches the sender of


the message against those provided in a list

6. Size Mail Filter: This filters messages based on their


size

2.2 Outputs:

1. Log Files: It writes the log files according to the


operations server handled. It writes also error message
if any failure occurred to indicate fault where happened.
It represents all this information in the form codes
assigned for each and every operation.
3. External Interface Requirements

3.1 User Interface: After the filters are embedded in Mail


server and making all of them working properly no need of
user interaction in case of administrator set those filters to
run periodically. Otherwise it is the responsibility of the
administrator to run them when he required. Totally the user
interaction is very low.

3.2 Software Interfaces: these interface requirements


should specify the inter face with other. Software which the
system will use or which will use the system, this includes
the interface with the operating system and other
applications.

The message content and format of each interface should be


given.
3.3 Hardware Interfaces: Hardware interface is very
important to the documentation. If the software is execute
on existing hardware or on the pre-determined hardware, all
the characteristics of the hardware, including memory
restrictions, should be specified. In addition, the current use
and load characteristics of the hardware should be given.

4. Performance Requirements

All the requirements relating to the performance characteristics of


the system must be clearly specified. There are two types of
performance requirements – static and dynamic. Static
Requirements are those that do not impose constraint on the
execution characteristics of the system. These include
requirements like the number of terminals to be supported, and
number simultaneous users to be supported, number of files, and
their sizes that the system has to process. These are also called
capacity of the system. Dynamic requirements specify constraints
on execution behavior of the system. These typically include
response time and throughput constraints on the system.

The processing speed, respective resource consumption throughput and


efficiency measure performance. For achieving good performance

Few requirements like reducing code, less use of controls, minimum


involvement of repeated data etc., are to be followed. Each real-time system,
software what provides required function but does not conform to
performance of software requirements is acceptable. These requirements are
used to test run time performance of software with the context of an
integrated system.

5. Design constraints

5.1 Software constraints :

Operating System : Windows2000 Server/


NT or any Mail server
Reports : Log files
Other Applications : James Server

5.2 Hardware Constraints :

Pentium Processor : Pentium IV 2.0 GHZ


RAM : 256 MB
Hard Disk : 40 GB
Floppy Disk : 1.44 MB
CD/ROM Drive : 52 Bit
VDU : VGA

Key Board : 101 Standards


6. Acceptance Criteria

Before accepting the system, the developer must demonstrate that the
system works on the details of the user email-ids entered in the
corresponding files. The developer will have to show through test cases that
all conditions are satisfied.

The Java Apache Mail Enterprise Server (a.k.a.


Apache James) is a 100% pure Java SMTP and
POP3 Mail server and NNTP News server designed
to be a complete and portable enterprise mail
engine solution. James is based on currently
available open protocols.
The James server also serves as a mail application
platform. The James project hosts the Apache
Mailet API, and the James server is a Mailet
container. This feature makes it easy to design,
write, and deploy custom applications for mail
processing. This modularity and ease of
customization is one of James' strengths, and can
allow administrators to produce powerful
applications surprisingly easily.
James is built on top of version 4.1.3 of the Avalon
Application Framework. This framework encourages
a set of good development practices such as
Component Oriented Programming and Inversion of
Control. The standard distribution of James
includes version 4.0.1 of the Phoenix Avalon
Framework container. This stable and robust
container provides a strong foundation for the
James server.
This documentation is intended to be an
introduction to the concepts behind the James
implementation, as well as a guide to installing,
configuring, (and for developers) building the
James server.

The James Server


James is an open source project intended to produce a
robust, flexible, and powerful enterprise class server
that provides email and email-related services. It is
also designed to be highly customizable, allowing
administrators to configure James to process email in
a nearly endless variety of fashions.

The James server is built on top of the Avalon


Framework. The standard James distribution deploys
inside the Phoenix Avalon Framework container. In
addition to providing a robust server architecture for
James, the use of Phoenix allows James administrators
to deploy their own applications inside the container.
These applications can then be accessed during mail
processing.

The James server is implemented as a complete


collection of servers and related components that,
taken together, provide an email solution. These
components are described below.
POP3 Service
The POP3 protocol allows users to retrieve email
messages. It is the method most commonly used by
email clients to download and manage email
messages.

The James version of the POP3 service is a simple and


straightforward implementation that provides full
compliance with the specification and maximum
compatibility with common POP3 clients. In addition,
James can be configured to require SSL/TLS
connections for POP3 client connecting to the server.
SMTP Service
SMTP (Simple Mail Transport Protocol) is the standard
method of sending and delivering email on the
internet. James provides a full-function
implementation of the SMTP specification, with
support for some optional features such as message
size limits, SMTP auth, and encrypted client/server
communication.
NNTP Service
NNTP is used by clients to store messages on and
retrieve messages from news servers. James
provides the server side of this interaction by
implementing the NNTP specification as well as
an appropriate repository for storing news
messages. The server implementation is simple
and straightforward, but supports some additional
features such as NNTP authentication and
encrypted client/server communication.
Fetch POP
Fetch POP, unlike the other James components, is not
an implementation of an RFC. Instead, it's a
component that allows the administrator to configure
James to retrieve email from a number of POP3
servers and deliver them to the local spool. This is
useful for consolidating mail delivered to a number of
accounts on different machines to a single account.
The Spool Manager, Matchers, and Mailets
James separates the services that deliver mail to
James (i.e. SMTP, Fetch POP) from the engine that
processes mail after it is received by James. The Spool
Manager component is James' mail processing engine.
James' Spool Manager component is a Mailet
container. It is these mailets and matchers that
actually carry out mail processing.
Repositories
James uses a number of different repositories to both
store message data (email, news messages) and user
information. User repositories store user information,
including user names, authentication information, and
aliases. Mail repositories store messages that have
been delivered locally. Spool repositories store
messages that are still being processed. Finally, news
repositories are used to store news messages. Aside
from what type of data they store, repositories are
distinguished by where they store data. There are
three types of storage - File, Database, and DBFile.
Remote Manager
James provides a simple telnet-based interface for
control. Through this interface you can add and delete
users, configure per-user aliases and forward
addresses, and shut down the server.
Maillet API:

The Mailet API is a simple API used to build mail


processing applications. James is a Mailet container, allowing
administrators to deploy Mailers (both custom and pre-made) to
carry out a variety of complex mail processing tasks. In the
default configuration James uses Mailers to carry out a number
of tasks that are carried out deep in the source code of other
mail servers (i.e. list processing, remote and local delivery).

As it stands today, the Mailet API defines interfaces for


both Matchers and Mailets.Matchers, as their name would
suggest, match mail messages against certain conditions. They
return some subset (possibly the entire set) of the original
recipients of the message if there is a match. An inherent part of
the Matcher contract is that a Matcher should not induce any
changes in a message under evaluation.

Mailets are responsible for actually processing the


message. They may alter the message in any fashion, or pass
the message to an external API or component. This can include
delivering a message to its destination repository or SMTP
server.

The Mailet API is currently in its second revision. Although,


the Mailet API is expected to undergo substantial changes in the
near future, it is our aim that existing Mailets that abided purely
by the prior Mailet API interfaces will continue to run with the
revised specification.James bundles a number of Matchers and
Mailets in its distribution.
1... INTRODUCTION

The objective of Simple Mail Transfer Protocol (SMTP) is to


transfer mail reliably and efficiently. SMTP is independent of the
particular transmission subsystem and requires only a reliable ordered
data stream channel.
An important feature of SMTP is its capability to relay mail
across transport service environments. A transport service provides an
interposes communication environment (IPCE). An IPCE may cover
one network, several networks, or a subset of a network. It is
important to realize that transport systems (or IPCEs) are not one-to-
one with networks. A process can communicate directly with another
process through any mutually known IPCE. Mail is an application or
use of interposes communication. Mail can be communicated between
processes in different IPCEs by relaying through a process connected
to two (or more) IPCEs. More specifically, mail can be relayed between
hosts on different transport systems by a host on both transport
systems.

2. THE SMTP MODEL


The SMTP design is based on the following model of
communication: as the result of a user mail request, the sender-SMTP
establishes a two-way transmission channel to a receiver-SMTP. The
receiver-SMTP may be either the ultimate destination or an
intermediate. SMTP commands are generated by the sender-SMTP and
sent to the receiver-SMTP. SMTP replies are sent from the receiver-
SMTP to the sender-SMTP in response to the commands.

Once the transmission channel is established, the SMTP-sender


sends a MAIL command indicating the sender of the mail. If the SMTP-
receiver can accept mail it responds with an OK reply. The SMTP-
sender then sends a RCPT command identifying a recipient of the mail.
If the

SMTP-receiver can accept mail for that recipient it responds with an


OK reply; if not, it responds with a reply rejecting that recipient (but
not the

Whole mail transaction). The SMTP-sender and SMTP-


receiver may negotiate several recipients. When the recipients
have been negotiated the SMTP-sender sends the mail data,
terminating with a special sequence. If the SMTP-receiver
successfully processes the mail data it responds with an OK
reply. The dialog is purposely lock-step, one-at-a-time.

+----------+ +----------+
+------+ | | | |
| User |<-->| | SMTP | |
+------+ | Sender- |Commands/Replies| Receiver-|
+------+ | SMTP |<-------------->| SMTP
| +------+
| File |<-->| | and Mail |
|<-->| File |
|System| | | |
| |System|
+------+ +----------+ +----------
+ +------+

Sender-SMTP Receiver-SMTP

Model for SMTP Use

Figure 1
-------------------------------------------------------
The SMTP provides mechanisms for the transmission of mail;
directly from the sending user's host to the receiving user's host when
the August 1982 Simple Mail Transfer Protocol two host are connected
to the same transport service, or via one or more relay SMTP-servers
when the source and destination hosts are not connected to the same
transport service.

To be able to provide the relay capability the SMTP-server must


be supplied with the name of the ultimate destination host as well as
the destination mailbox name.

The argument to the MAIL command is a reverse-path, which


specifies who the mail is from. The argument to the RCPT command is
a forward-path, which specifies who the mail is to. The forward-path is
a source route, while the reverse-path is a return route (which may be
Used to return a message to the sender when an error occurs with a
relayed message).

When the same message is sent to multiple recipients the SMTP


encourages the transmission of only one copy of the data for all the
recipients at the same destination host. The mail commands and
replies have a rigid syntax. Replies also have a numeric code.
Commands and replies are not case sensitive. That is, a command or
reply word may be upper case, lower case, or any mixture of upper
and lower case. Note that this is not true of mailbox user names. For
some hosts the user name is case sensitive, and SMTP
implementations must take case to preserve the case of user names as
they appear in mailbox arguments. Host names are not case
sensitive.

Commands and replies are composed of characters from the


ASCII character set [1]. When the transport service provides an 8-bit
byte (octet) transmission channel, each 7-bit character is transmitted
right justified in an octet with the high order bit cleared to zero.

When specifying the general form of a command or reply, an


argument or special symbol will be denoted by a meta-linguistic
variable (or constant), for example,"<string>" or "<reverse-path>".
Here the angle brackets indicate these are meta-linguistic variables.

However, some arguments use the angle brackets terally. For


example, an actual reverse-path is enclosed in angle brackets,
i.e.,"<John.Smith@USC-ISI.ARPA>" is an instance of <reverse-path>
(the angle brackets are actually transmitted in the command or reply).
3 THE SMTP PROCEDURES

This section presents the procedures used in SMTP in several


parts. First comes the basic mail procedure defined as a mail
transaction. Following this are descriptions of forwarding mail,
verifying mailbox names and expanding mailing lists, sending to
terminals instead of or in combination with mailboxes, and the opening
and closing exchanges. At the end of this section are comments on
relaying, a note on mail domains, and a discussion of changing roles.

3.1 MAIL

There are three steps to SMTP mail transactions. The


transaction is started with a MAIL command which gives the
sender identification. A series of one or more RCPT commands
follows giving the receiver information. Then a DATA command
gives the mail data. And finally, the end of mail data indicator
confirms the transaction.

The first step in the procedure is the MAIL command. The


<reverse-path> contains the source mailbox.

MAIL <SP> FROM :< reverse-path> <CRLF>


This command tells the SMTP-receiver that a new mail
transaction is starting and to reset all its state tables and buffers,
including any recipients or mail data. It gives the reverse-path
which can be used to report errors. If accepted, the receiver-
SMTP returns a 250 OK reply.

The <reverse-path> can contain more than just a mailbox.


The <reverse-path> is a reverse source outing list of hosts and
source mailbox. The first host in the <reverse-path> should be
the host sending this command.

The second step in the procedure is the RCPT command.

RCPT <SP> TO :< forward-path> <CRLF>

This command gives a forward-path identifying one


recipient. If accepted, the receiver-SMTP returns a 250 OK reply,
and stores the forward-path. If the recipient is unknown the
receiver-SMTP returns a 550 Failure reply. This second step of
the procedure can be repeated any number of times.
The <forward-path> can contain more than just a mailbox.
The <forward-path> is a source routing list of hosts and the
destination mailbox. The first host in the <forward-
path> should be the host receiving this command.

The third step in the procedure is the DATA


command.

DATA <CRLF>

If accepted, the receiver-SMTP returns a 354 Intermediate


reply and considers all succeeding lines to be the message text.
When the end of text is received and stored the SMTP-receiver
sends a 250 OK reply. Since the mail data is sent on the
transmission channel the end of the mail data must be indicated
so that the command and reply dialog can be resumed. SMTP
indicates the end of the mail data by sending a line containing
only a period. A transparency procedure is used to prevent this
from interfering with the user's text.

Please note that the mail data includes the memo header
items such as Date, Subject, To, Cc, from [2].The end of mail
data indicator also confirms the mail transaction and tells the
receiver-

SMTP to now process the stored recipients and mail data. If


accepted, the receiver-SMTP returns a 250 OK reply. The DATA
command should fail only if the mail transaction was incomplete
(for example, no recipients), or if resources are not available.

The above procedure is an example of a mail transaction.


These commands must be used only in the order discussed
above. Example 1 (below) illustrates the use of these commands
in a mail transaction.

Example of the SMTP Procedure

This SMTP example shows mail sent by Smith at host


Alpha.ARPA, to Jones, Green, and Brown at host Beta.ARPA.
Here we assume that host Alpha contacts host Beta directly.

S: MAIL FROM :< Smith@Alpha.ARPA>


R: 250 OK

S: RCPT TO :< Jones@Beta.ARPA>


R: 250 OK
S: RCPT TO :< Green@Beta.ARPA>
R: 550 No such user here

S: RCPT TO :< Brown@Beta.ARPA>


R: 250 OK

S: DATA
R: 354 Start mail input; end with <CRLF>.<CRLF>
S: Blah blah blah...
S: ...etc. etc. etc.
S: <CRLF>.<CRLF>
R: 250 OK

The mail has now been accepted for Jones and Brown.
Green did not have a mailbox at host Beta in Example 1.

3.2. FORWARDING

There are some cases where the destination information in


the <forward-path> is incorrect, but the receiver-SMTP knows
the correct destination. In such cases, one of the following
replies should be used to allow the sender to contact the correct
destination.

251 -User not local; will forward to <forward-path>

This reply indicates that the receiver-SMTP knows the


user's mailbox is on another host and indicates the correct
forward-path to use in the future. Note that either the host or
user or both may be different. The receiver takes responsibility
for delivering the message.

551 User not local; please try <forward-path>

This reply indicates that the receiver-SMTP knows the


user's mailbox is on another host and indicates the correct
forward-path to use. Note that either the host or user or both
may be different. The receiver refuses to accept mail for this
user, and the sender must either redirect the mail according to
the information provided or return an error response to the
originating user.
3.3 VERIFYING AND EXPANDING

SMTP provides as additional features, commands to verify a


user name or expand a mailing list. This is done with the VRFY
and EXPN commands, which have character string arguments.
For the VRFY command, the string is a user name, and the
response may include the full name of the user and must include
the mailbox of the user. For the EXPN command, the string
identifies a mailing list, and the multilane response may include
the full name of the users and must give the mailboxes on the
mailing list. "User name" is a fuzzy term and used purposely. If a
host implements the VRFY or EXPN commands then at least local
mailboxes must be recognized as "user names". If a host
chooses to recognize other strings as "user names" that is
allowed.
In some hosts the distinction between a mailing list and an alias
for a single mailbox is a bit fuzzy, since a common data structure
may hold both types of entries, and it is possible to have mailing
lists of one mailbox. If a request is made to verify a mailing list a
positive response can be given if on receipt of a message so
addressed it will be delivered to everyone on the list, otherwise
an error should be reported (e.g., "550 That is a mailing list, not
a user"). If a request is made to expand a user name returning a
list containing one name can form a positive response, or an
error can be reported (e.g., "550 That is a user name, not a
mailing list").

In the case of a multiline reply (normal for EXPN) exactly


one mailbox is to be specified on each line of the reply. In the
case of an ambiguous request, for example, "VRFY Smith", where
there are two Smith's the response must be "553 User
ambiguous". The case of verifying a user name is straightforward
as shown in examp 3.

The character string arguments of the VRFY and EXPN


commands cannot be further restricted due to the variety of
implementations of the user name and mailbox list concepts. On
some systems it may be

appropriate for the argument of the EXPN command to be a file name


for a file containing a mailing list, but again there is a variety of file
naming conventions in the Internet.

The VRFY and EXPN commands are not included in the minimum
implementation (Section 4.5.1), and are not required to work across
relays when they are implemented.
3.4. SENDING AND MAILING

The main purpose of SMTP is to deliver messages to user's


mailboxes. A very similar service provided by some hosts is to deliver
messages to user's terminals (provided the user is active on the
host). The delivery to the user's mailbox is called "mailing", the
delivery to the user's terminal is called "sending". Because in many
hosts the implementation of sending is nearly identical to the
implementation of mailing these two functions are combined in SMTP.
However the sending commands are not included in the required
minimum implementation (Section 4.5.1). Users should have the
ability to control the writing of messages on their terminals. Most
hosts permit the users to accept or refuse such messages.

The following three commands are defined to support the


sending options. These are used in the mail transaction instead of the
MAIL command and inform the receiver-SMTP of the special semantics
of this transaction:

SEND <SP> FROM: <reverse-path> <CRLF>

The SEND command requires that the mail data be delivered to


the user's terminal. If the user is not active (or not accepting terminal
messages) on the host a 450 reply may returned to a RCPT
command. The mail transaction is successful if the message is
delivered the terminal.

SOML <SP> FROM: <reverse-path> <CRLF>

The Send or Mail command requires that the mail data be


delivered to the user's terminal if the user is active (and accepting
terminal messages) on the host. If the user is not active (or not
accepting terminal messages) then the mail data is entered into the
user's mailbox. The mail transaction is successful if the message is
delivered either to the terminal or the mailbox.

SAML <SP> FROM: <reverse-path> <CRLF>

The Send and Mail command requires that the mail data be
delivered to the user's terminal if the user is active (and accepting
terminal messages) on the host. In any case the mail data is entered
into the user's mailbox. The mail transaction is successful if the
message is delivered the mailbox. The same reply codes that are used
for the MAIL commands are used for these commands.
INTRODUCTION:

The JavaMail API provides a set of abstract classes defining objects


that comprise a system. The API defines classes like Message, Store
and Transport. The API can be extended and can be subclassed to
provide new protocols and to add functionality when necessary. In
addition, the API provides concrete subclasses of the abstract classes.
These subclasses, including MimeMessage and MimeBodyPart,
implement widely used Internet mail protocols and conform to
specifications RFC822 and RFC2045. They are ready to be used in
application development.

GOALS AND DESIGN PRINCIPLES:

The JavaMail API is designed to make adding electronic mail capability


to simple applications easy, while also supporting the creation of
sophisticated user interfaces.It includes appropriate convenience
classes which encapsulate common mail functions and protocols. It
fits with other packages for the Java platform in order to facilitate its
use with other Java APIs, and it uses familiar programming models.

The JavaMail API is therefore designed to satisfy the following


development and runtime requirements:
• Simple, straightforward class design is easy for a developer to
learn and implement.
• Use of familiar concepts and programming models support code
development that interfaces well with other Java APIs.
• Uses familiar exception-handling and JDK 1.1 event-handling
programming models.
• Uses features from the JavaBeans Activation Framework (JAF)
to handle access to data based on data-type and to facilitate
the addition of data types and commands on those data types.
The JavaMail API provides convenience functions to simplify
these coding tasks.
• Lightweight classes and interfaces make it easy to add basic
mail-handling tasks to any application.
• Supports the development of robust mail-enabled applications,
that can handle a variety of complex mail message formats,
data types, and access and transport protocols.
The JavaMail API draws heavily from IMAP, MAPI, CMC, c-client and
other messaging system APIs: many of the concepts present in these
other systems are also present in the JavaMail API. It is simpler to
use because it uses features of the Java programming language not
available to these other APIs, and because it uses the Java
programming language’s object model to shelter applications from
implementation
complexity.
The JavaMail API supports many different messaging system
implementations—different message stores, different message
formats, and different message transports.The JavaMail API provides
a set of base classes and interfaces that define the API for client
applications. Many simple applications will only need to interact with
the messaging system through these base classes and interfaces.
JavaMail subclasses can expose additional messaging system
features. For instance,the MimeMessage subclass exposes and
implements common characteristics of an Internet mail message, as
defined by RFC822 and MIME standards. Developers cansubclass
JavaMail classes to provide the implementations of particular
messaging systems, such as IMAP4, POP3, and SMTP.

ARCHITECTURE

The JavaMail architectural components are layered as shown below:

1. The Abstract Layer declares classes, interfaces and abstract


methods intended to support mail handling functions that all mail
systems support. API elements comprising the Abstract Layer are
intended to be subclassed and extended as necessary in order to
support standard data types, and to interface with message access
and message transport protocols as necessary.

2. The internet implementation layer implements part of the abstract


layer using internet standards - RFC822 and MIME.

3 JavaMail uses the JavaBeans Activation Framework (JAF) in order to


encapsulate message data, and to handle commands intended to
interact with that data. Interaction with message data should take
place via JAF-aware JavaBeans, which are not provided by the
JavaMail API.
JavaMail clients use the JavaMail API and Service Providers implement
the JavaMail API. The layered design architecture allows clients to use
the same JavaMail API calls to send, receive and store a variety of
messages using different data-types from different message stores
and using different message transport protocols.
JAVA MAIL CLASS HIERARCHY:
The figure below shows major classes and interfaces comprising the
JavaMail API.

THE JAVA MAIL FRAME WORK:


The JavaMail API is intended to perform the following functions, which
comprise the standard mail handling process for a typical client
application:

MAJOR JAVA MAIL API COMPONENTS:

This section reviews major components comprising the JavaMail


architecture.The Message Class The Message class is an abstract class
that defines a set of attributes and a content for a mail message.
Attributes of the Message class specify addressing information and
define the structure of the content, including the content type. The
content is represented as a DataHandler object that wraps around the
actual data.
The Message class implements the Part interface. The Part interface
defines attributes that are required to define and format data content
carried by a Message object, and to interface successfully to a mail
system. The Message class adds From, To, Subject, Reply-To, and
other attributes necessary for message routing via a message
transport system. When contained in a folder, a Message object has a
set of flags associated with it. JavaMail provides Message subclasses
that support specific messaging implementations.

Message Storage and Retrieval


Messages are stored in Folder objects. A Folder object can contain
subfolders as well as messages, thus providing a tree-like folder
hierarchy. The Folder class declares methods that fetch, append, copy
and delete messages. A Folder object can also send events to
components registered as event listeners.
Message Composition and Transport

A client creates a new message by instantiating an appropriate


Message subclass. It sets attributes like the recipient addresses and
the subject, and inserts the content into the Message object. Finally,
it sends the Message by invoking the Transport.send method.

The Session Class

The Session class defines global and per-user mail-related properties


that define the interface between a mail-enabled client and the
network. JavaMail system components use the Session object to set
and get specific properties. The Session class also provides a default
authenticated session object that desktop applications can share. The
Session class is a final concrete class. It cannot be subclassed.

Using the JavaMail API

This section defines the syntax and lists the order in which a client
application calls some JavaMail methods in order to access and open a
message located in a folder:

1. A JavaMail client typically begins a mail handling task by obtaining


the default JavaMail Session object.
Session session = Session.getDefaultInstance(props, authenticator);
2. The client uses the Session object’s getStore method to connect to
the default store. The getStore method returns a Store object
subclass that supports the access protocol defined in the user
properties object, which will typically contain per-user preferences.
Store store = session.getStore();
store.connect();
3. If the connection is successful, the client can list available folders in
the Store, and then fetch and view specific Message objects.
// get the INBOX folder
Folder inbox = store.getFolder("INBOX");
// open the INBOX folder
inbox.open(Folder.READ_WRITE);
Message m = inbox.getMessage(1); // get Message # 1
String subject = m.getSubject(); // get Subject
Object content = m.getContent(); // get content
... ……
...
4. Finally, the client closes all open folders, and then closes the store.
inbox.close(); // Close the INBOX
store.close(); // Close the Store
DESIGN PRINCIPLES & METHODOLOGY

To produce the design for large module can be extremely


complex task. The design principles are used to provide effective
handling the complexity of the design process, it will not reduce to the
effort needed for design but can also reduce the scope of introducing
errors during design.

For solving the large problems, the problem is divided into


smaller pieces, using the time-tested principle of “divide and conquer”.
This system problem divides into smaller pieces, so that each piece
can be conquered separately. For software design, the problem is to
divide into manageable small pieces that can be solved separately.
This divide principle is used to reduce the cost of the entire problem
that means the cost of solving the entire problem is more than the
sum of the cost of solving all the pieces.

When partitioning is high, then also arises a problem due to the


cost of partitioning. In this situation to know the judgement about
when to stop partitioning.

In design, the most important quality criteria are simplicity and


understandability. In this each the part is easily related to the
application and that each piece can be modified separately. Proper
partitioning will make the system to maintain by making the designer
to understand problem partitioning also aids design verification.

Abstraction is essential for problem partitioning and is used for


existing components as well as components that are being designed,
abstracting of existing component plays an important role in the
maintenance phase. ding design process of the system.

In the functional abstraction, the main four modules to taking


the details and computing for further actions. In data abstraction it
provides some services.
The system is a collection of modules means components. The
highest-level component corresponds to the total system. For design
this system, first following the top-down approach to divide the
problem in modules. In top-down design methods often result in some
form of stepwise refinement after divide the main modules, the
bottom-up approach is allowed to designing the most basic or primitive
components to higher-level components. The bottom-up method
operations starting from very bottom.

In this system, the system is main module, because it consists


of discrete components such that each component supports a well-
defined abstraction and if a change to the component has minimal
impact on other components. The modules are highly coupled and
coupling is reduced in the system. Because the relationships among
elements in different modules is minimized.

Design Objectives

These are some of the currently implemented features:

Complete portability Apache James is a 100% pure Java


application based on the Java 2 platform and the Java Mail 1.3 API.

Protocol abstraction unlike other mail engines, protocols are


seen only like "communication languages" ruling communications
between clients and the server. Apache James is not be tied to any
particular protocol but follow an abstracted server design (like Java
Mail did on the client side)

Complete solution the mail system is able to handle both mail


transport and storage in a single server application. Apache James
works alone without the need for any other server or solution.

Mailet support Apache James supports the Apache Mailet API.


A Mailet is a discrete piece of mail-processing logic which is
incorporated into a Mailet-compliant mail-server's processing. This
easy-to-write, easy-to-use pattern allows developers to build
powerful customized mail systems. Examples of the services a Mailet
might provide include: a mail-to-fax or mail-to-phone transformer, a
filter, a language translator, a mailing list manager, etc. Several
Mailets are included in the JAMES distribution.

Resource abstraction like protocols, resources are abstracted


and, accessed through defined interfaces (Java Mail for transport,
JDBC for spool storage or user accounts in RDBMS's, Apache Mailet
API). The server is highly modular and reuses solutions from other
projects.

Secure and multi-threaded design Based on the technology


developed for the Apache JServ servlet engine, Apache James has a
careful, security-oriented, full multi-threaded design, to allow
performance, scalability and mission-critical use.

System design is the process of applying various techniques and


principles for the purpose of definition a system in sufficient detail to
permit its physical realization.

Software design is the kernel of the software engineering process.


Once the software requirements have been analyzed and specified, the
design is the first activity. The flow of information during this process
is as follows.
Information domain details

Function specification

Desi
Behavioral specification gn

Other requirement modules Program


Code
Procedural design

Test

Software design is the process through which requirements are


translated into a representation of software.

 Primary design is concerned with the transformation of


requirements into data and software architecture.
 Detailed design focuses on refinements to the architectural
representations that lead to detailed data structure and algorithmic

representation for software. In the present project report only


preliminary design is given more emphasis.
System design is the bridge between system & requirements analysis
and system implementation. Some of the essential fundamental
concepts involved in the design of as applications are
 Abstraction
 Modularity
 Verification
Abstraction is used to construct solutions to problems without having
to take account of the intricate details of the various component sub-
programs. Abstraction allows system designer to make step-wise
refinements by which attach stage of the design unnecessary details
annunciate with representation or implementation may be hidden from
the surrounding environment.

Modularity is concerned with decomposing of main module into well-


defined, manageable units with well-defined interfaces among the
units. This enhances design clarity, which in turn eases
implementation, debugging, testing, and documentation maintaining of
the software product. Modularity viewed in this senses vital tool in the
construction of large software projects.

Verification is fundamental concept in software design. A design is


verification. It can be demonstrated that the design will result in an
implementation, which satisfied the customer’s requirements.

Some of the important factors of quality that are to be considered in


the design of application are:

The software should behave strictly according to the


original specification of satisfying customer’s requirements and should
function smoothly under normal and possible abnormal conditions.
This product is highly reliable, can handle any number of mails to
filter.

The design of the system must be such a way that


any new additions to the information functional and behavioral domain
may be done easily and should be adapted to new specifications. We
provided this extensibility to this product. you can add any number of
filters to your product in the future.
System design is the process of developing specification for the
candidate system that meets the criteria established during the phase
of system analysis. Major step in the design is the preparation of input
forms and design of output reports in a form acceptable to the user.
These steps in turn lead to a successful implementation of the system.

In this project we focus on Privacy-Aware Collaborative Spam


Filtering document, which is a part of our James Server. We configure
our logic in that place to work. Actually Privacy-Aware Collaborative
Spam Filtering document is the main key to implement our filters. First
It considers our filters and then based on the logic in those filters it
takes the decision to drop the messages or not. Following is the design
document:

Privacy-Aware Collaborative Spam Filtering document: Privacy-Aware


Collaborative Spam Filtering document is an application to download
your email through protocols like POP3 and IMAP. It also allows you to
retrive your news messages through NNTP. In addition to the simple
feature of downloading mail, Mail Fetch has the concept of mail filters.
A filter has the single job of deciding whether or not to download a
single message. The actual decision of whether to download a mail or
not is made through a sequence of filters. There can be a global set of
filters as well as a per maildrop one. A maildrop represents your
mailbox from which you want to download your mail.

Privacy-Aware Collaborative Spam Filtering document is written


in the Java Programming language and has an extensible XML based
configuration. Privacy-Aware Collaborative Spam Filtering document is
very easy to configure. All that has to be done is edit the plain text
configuration file. I have been written a fair amount of documentation,
so that should help.

Privacy-Aware Collaborative Spam Filtering document can


process multiple maildrops with individual filter mechanisms and poll
times.

Features:

Following are the list of features provided by Privacy-Aware


Collaborative Spam Filtering document:
• POP3 and IMAP Protocol Support
• Can handle any number of Maildrops
• Polling mechanism to periodically check maildrops for new
messages
• Filtering system for downloading mail
• Standard filters provided like Size, Message-id, Sender
• Easy pluggability of user defined filters
• Runs on all platforms supported by Java2
• Configurable logging mechanism to keep track of mails
downloaded
• Multiple delivery options provided - like Mailbox and SMTP
Delivery
• Delivery options accessible at the filter level
• Experimental NNTP Support

Modules:
Core Module: This module helps in interacting with the XML and
reads the required information. After the reading the information
it can interact with the specified mail boxes as you require and
download the mails. It also co-ordinates other modules.

Filtering Module: This module deals applying the filter on the


specified mail boxes and unwanted mails. It follows sequence of
applying the filter as we have specified in the XML file. It also allows
applying the filters globally and locally according the employee
customization.

Delivery Agents Module: This module deals with sending the


remaining mails in the specified mail boxes to delivery agent, Each
delivery agent the backup copy of the mails at a targeted location.

Configuring and Extending Privacy-Aware Collaborative Spam


Filtering document:

Privacy-Aware Collaborative Spam Filtering document uses XML


for configuration. The configuration file is MailFetch.xml. This file exists
in conf directory

The configuration file is accompanied by a detailed document


instructing one on how to configure Privacy-Aware Collaborative Spam
Filtering document. I would recommend referring to that document
whenever you have some problem following what I’m saying. This
document is called Configuration.txt.
Essentially, there are maildrops to download mail from - they
contain all the information about accessing a maildrop. There is a
global sequence of filters, which are checked for each maildrop before
the maildrop-local sequence of filters. Filters can be configured
through the configuration file. For example a size-based filter would
like to know what size it should filter at and also what action it should
take when a message is of a greater size. Each of the filters
themselves may have some additional configuration options. The
additional configuration is totally dependent on the filter itself. You
could add your own filter and want to be configured from the
configuration file. I shall expand on that later in this section.

There is also the option of delivery agents. After a message


passes through all the filters and none of them have an objection with
it being downloaded, it is downloaded and sent to Delivery Agent who
is responsible to delivering it (to a mailbox, maildir, SMTP host etc).
Mail Fetch supports different kinds of delivery agents and you can
choose one of them for delivery of your mail. You can go so far as to
make each of your maildrops deliver messages to a different delivery
agent! A Maildrop itself needs to specify its delivery agent when all the
filters let the message pass through. Each delivery agent has an id - it
can thereafter be referred to by its id. Some filters support delivering
messages to a delivery agent specified in their configuration. For
example, all messages from the execve@users.sourceforge.net would
go to the execve mailbox if the SenderMailFilter is configured.

You can implement your own filters by implementing certain


interfaces, a user can very easily add his/her own filter to the current
set of provided filters. Examples of filters are spam control, size
restrictions etc. Mail Fetch downloads the email if it matches the
criteria and then can deliver it using one of its delivery options.
Currently, one can choose to deliver mail to a mailbox or to an SMTP
Server

You will need to specify the name of the class you have
implemented in the configuration, so that Privacy-Aware Collaborative
Spam Filtering document can initialize it as required. Note that the
class has to be in the system classpath. This can be easily achieved by
putting the class in a jar and putting it in the lib directory. The script
picks up all the jars from the directory and places them in the
classpath before invoking Privacy-Aware Collaborative Spam Filtering
document. All the delivery agents specified in the configuration are
available to the filters through a Privacy-Aware Collaborative Spam
Filtering document. delivery. DeliveryManager object. This object
allows access to these agents based on their ids. NOTE that the id of
the agent has to known by the filter requesting for the agent. A
Delivery Event is generated when a message is delivered after passing
through all the filters. NOTE that there is no event generated when a
filter itself delivers a message through an agent. The easiest way to
get a hang of how to implement the filter of your choice is to get a
hold of the source and checkout some of the implemented filters (like
NullMailFilter!!)

That’s about it in terms of Privacy-Aware Collaborative Spam


Filtering document configuration. Go on, open the conf/JFetch.xml file
in the Mail directory and play with it. Do let me know of any problems
you face; let me know even if you don’t.

Table of Contents
===========

1. Introduction
2. Some Definitions
2.1 Maildrops
2.2 Filters
2.3 DeliveryAgents
2.4 Events
3. Detailed Configuration
3.1 Maildrop
3.2 Mailfilters
3.2.1 Global filters
3.2.2 Local filters
3.2.3 All filters explained
3.3 Delivery Agents
3.4 Miscellaneous Configuration
4. Sample configuration file
5. Advanced usage

1. Introduction:

Privacy-Aware Collaborative Spam Filtering document is an


application to access your remote email. It supports popular mail
protocols like POP3 and IMAP. You can download your mail to your
local machine and use an email client to read it. Mail Fetch also comes
with a very powerful and flexible filtering system. In fact, Privacy-
Aware Collaborative Spam Filtering document comes with a range of
filters out of the box; so you can get started immediately. These filters
range from those, which prevent you from getting the same message
twice to those which help in spam filtering.

Privacy-Aware Collaborative Spam Filtering document is written in


Java and so has the advantage of running on most platforms.
Configuration is text-based and is an XML file. This document tells you
how to configure Privacy-Aware Collaborative Spam Filtering
document. It details out the various configuration options available
and also provides a sample configuration file.

2. Some Definitions:
2.1 Maildrops

A Maildrop is the mailbox from where you download your mail.


Characteristics of a maildrop are the protocol (POP3, IMAP, NNTP), the
username, password, hostname, port number, the default Delivery
Agent for that maildrop, any filters for that maildrop and finally any
protocol-specific configuration for the maildrop. For example an NNTP
maildrop would contain newsgroup information, which is not used by a
POP3 or IMAP maildrop.

2.2 Filters

A Filter is the core of the decision making in Privacy-Aware


Collaborative Spam Filtering document. A filter decides on a per-mail
basis whether the message should be downloaded or not. A pipeline of
filters is setup (yes, again setup in the configuration) and a message
which needs to be downloaded is passed through this pipeline. At any
point of the pipeline, a filter could indicate that the message should
not be processed through the pipeline anymore. For example a SPAM
filter (sender based) could find a match from the list of spammers it
has and reject the message.

There are two kinds of filters -- global and local. These are not
an attribute of a filter itself, but rather depend on the usage of a filter.
Local filters are associated to a maildrop whereas global filters are
applicable to all maildrops. For example, you might want a Message-ID
filter to be applicable to all maildrops whereas keep a sender-based
filter only for the maildrop where you expect mail from that sender.

2.3 DeliveryAgents

A Delivery Agent has the responsibility of delivering mail. The


current supported mediums are SMTP and mailbox. Delivery Agents
are identified by a unique ids in the configuration. Maildrops have a
default Delivery Agent configured which is used if the message passes
through the Filter pipeline successfully. Some filters also accept a
Delivery Agent attribute in the configuration. What this implies is that
if the message matches the Filter's criteria, the Filter delivers the
message using this Delivery Agent. This also allows for simple filtering
mechanisms. For example you might want all likely SPAM to be
delivered to special mailbox where you can then later check for any
false positives.

2.4 Events
Event is an internal concept of Privacy-Aware Collaborative
Spam Filtering document. If you are only going to use Privacy-Aware
Collaborative Spam Filtering document and the filters it provides out of
the box, you don't need to understand this concept. If you are
extending Privacy-Aware Collaborative Spam Filtering document by
developing your own Filters, you will need to understand this concept.
Whether you actually use it, depends on the Privacy-Aware
Collaborative Spam Filtering document functionality itself.

Events are a mechanism by which a Privacy-Aware


Collaborative Spam Filtering document can be notified when something
interesting happens in Privacy-Aware Collaborative Spam Filtering
document. Currently, we only generate events for the delivery of a
message. Let us take the example of the MessageIDMailFilter. This
filter rejects messages with message-ids which have already been
downloaded. This avoids receiving duplicate messages for example
when you are subscribed to two mailing lists and a cross-posting
happens. It maintains a list of message-ids which we have already
downloaded. The list is saved on the disk after the download of every
message so that if the session is interrupted due to any reason, the
message is not re-downloaded. So, the Privacy-Aware Collaborative
Spam Filtering document implements a DeliveryLister and hence, gets
the delivery event.

3. Detailed Configuration:

3.1 Maildrop
You can have more than one maildrops for Privacy-Aware Collaborative
Spam Filtering document to download mail from. Privacy-Aware
Collaborative Spam Filtering document downloads mail for them in the
order in which they are configured.
Here is a sample maildrop configuration:

<maildrop protocol="pop3" mda="smtp">


<host>mail.somepopserver.com</host>
<port>110</port>
<user>myusername</user>
<password>mypass</password>
<delete>true</delete>

<!-- filters specific to this maildrop -->


<filters>
</filters>
</maildrop>
The protocol attribute can be one of pop3, imap or nntp
(EXPERIMENTAL).

The mda attribute specifies the default delivery agent when the
message is ready to be downloaded. See Delivery Agents for more
information. This requires a delivery agent called "smtp" to be
configured.

Host, port, user and password are attributes for the connection
and authentication. Setting delete to true makes delete messages from
the maildrop once they are downloaded.

The filters configured *inside* the Maildrop element are the


local maildrop filters and will not affect other maildrops. Some
Maildrops like NNTP, have some extra configuration parameters like
the newsgroups which have to be downloaded. Please note that NNTP
support is EXPERIMENTAL and is not yet stable.

3.2 Mailfilters:

3.3 Global filters


All filters which are configured outside the maildrop elements
are
called global filters.These filters affect all the maildrops.The
configuration is the same for both global and local filters.

Here is a sample global filters configuration:


<filters>

<!-- size filter -->


<filter class="MailFetch.filters.SizeMailFilter" max-
size="1548576"
delete="false">
</filter>

<!-- sender mail filter -->


<filter class="MailFetch.filters.SenderMailFilter" delete="true"
blocklist="/home/gautam/MailFetch/spool/blocklist"
mda="junk">
</filter>

<!-- msgid filter -->


<filter class="MailFetch.filters.MessageIDMailFilter"
delete="true">
<storage name="msgid.cache" limit="8192"
destination="spool/msgid.cache"/>
</filter>

<!-- subject mail filter -->


<filter class="MailFetch.filters.SubjectMailFilter" delete="true"
blocklist="/home/gautam/MailFetch/spool/subject.blocklist"
mda="junk">
</filter>

</filters>

See "All filters explained" for detailed explanation of all provided


filters.

3.4 Local filters


All filters which are configured inside the maildrop elements
are called local filters. These filters affect only the maildrop
associated with them. The configuration is the same for both global
and local filters.

Here is a sample local filters configuration:

<maildrop protocol="pop3" mda="smtp">


<!-- .... standard maildrop config goes here .... -->
<!-- filters specific to this maildrop -->
<filters>
<!-- sender mail filter -->
<filter class="MailFetch.filters.SenderMailFilter" delete="true"
blocklist="/home/gautam/MailFetch/spool/linuxlist"
mda="linux">
</filter>
</filters>
</maildrop>

3.5 All filters explained

FILTER NAME : HeaderMailFilter


DESCRIPTION: Matches a header in the message. This requires the
name of the header and the value of the header
CLASS NAME : MailFetch.filters.HeaderMailFilter
SAMPLE CONFIGURATION:

<filter class="MailFetch.filters.HeaderMailFilter" delete="true"


name="X-Spam-Rating" value="SPAM" mda="spambox" >
</filter>
EXPLANATION: This filter allows to filter messages based on the
value of a particular header. The mda attribute is optional and
allows you to direct the message to the delivery agent specified if
the message matches the criteria.

FILTER NAME : MessageIDMailFilter


DESCRIPTION : Filters messages if they contain a duplicate
Message-
id. This Filter stores the list of downloaded
message-
ids in the specified file.
CLASS NAEM : MailFetch.filters.MessageIDMailFilter
SAMPLE CONFIGURATION:

<filter class="MailFetch.filters.MessageIDMailFilter"
delete="true">
<storage name="msgid.cache" limit="8192"
destination="spool/msgid.cache"/>
</filter>

EXPLANATION: The name of the storage element is a friendly name


of the repository. limit specifies the maximum number of elements
to allow in the list. The destination attribute is the actual file in
which the list is stored.

FILTER NAME :Null Filters


DESCRIPTION :his filter consumes all messages. It also marks them
for deletion.
ClASS NAME :MailFetch.filters.NullMailFilter
SAMPLE CONGIGURATION:

<filter class="MailFetch.filters.NullMailFilter" />

EXPLANATION :This filter is a special filter; it could be used to clean


up the maildrop for example. It is also a DANGEROUS filter, you
have been warned.

FILTER NAME :RecipientMailFilter


DESCRIPTION :This filter matches the recipients of the message
against those provided in a list.
CLASS NAME : MailFetch.filters.RecipientMailFilter
SAMPLE CONFIGURATION:

<filter class="MailFetch.filters.RecipientMailFilter"
delete="true"
blocklist="/home/gautam/MailFetch/spool/pers"
mda="personal">
</filter>

EXPLANATION: This filter checks if the recipients of the message


(TO and CC) exist in the defined list. The mda attribute is optional.
blocklist is the file containing the recipient addresses (one on each
line).

FILTER NAME : SenderMailFilter


DESCRIPTION : This filter matches the sender of the message
against
those provided in a list.
CLASS NAME : MailFetch.filters.SenderMailFilter
SAMPLE CONFIGURATION:

<filter class="MailFetch.filters.SenderMailFilter" delete="true"


blocklist="/home/gautam/MailFetch/spool/block"
mda="junk">
</filter>

EXPLANATION : This filter checks if the sender of the message exist


in the defined list. The mda attribute is optional. blocklist is the file
containing the sender addresses (one on each line).

FILTER NAME : SizeMailFilter


DESCRIPTION: This filters messages based on their size.
CLASS NAME : MailFetch.filters.SizeMailFilter
SAMPLE CONFIGURATION:

<filter class="MailFetch.filters.SizeMailFilter" max-


size="1548576"
delete="false">
</filter>

EXPLANATION: max-size is the maximum size of the message which


is permitted to be downloaded. The size is in bytes. A max-size of 0
indicates that the size restriction is lifted.

FILTER NAME : SubjectMailFilter


DESCRIPTION : This filter does subject based filtering based on a
list
CLASS NAME : MailFetch.filters.SubjectMailFilter
SAMPLE CONFIGURATION:

<filter class="MailFetch.filters.SubjectMailFilter" delete="true"


blocklist="spool/virus_list" mda="possible.virus">
</filter>
EXPLANATION: This filter is again similar to the sender/recipient
filters except that it does filtering based on the subject of the
message. The mda attribute is optional.

3.3 Delivery Agents

After passing through the filter pipeline, mail is delivered using


a DeliveryAgent. Currently we provide two main delivery
mechanisms: mbox and smtp. SMTP is the most reliable mechanism
although it requires that you have an MTA configured for delivery.

DELIVERY AGENT NAME: Mailbox


ESCRIPTION : Delivers a message to the specified
mbox.
CLASS NAME : MailFetch.delivery.MailboxDeliveryAgent
SAMPLE CONFIGURATION:

<mda class="MailFetch.delivery.MailboxDeliveryAgent"
id="junk">
<destination>/home/gautam/Mail/junkmail</destination>
</mda>

EXPLANATION: The destination element identifies the location of the


box where the delivery is made. Some basic dot-locking
functionality is provided by the mbox provider to avoid multiple
ccess to the mbox.

DELIVERY AGENT NAME : SMTP


DESCRIPTION : Deliver the message to a configured
SMTP
host
CLASS NAMES : MailFetch.filters.SMTPDeliveryAgent
SAMPLE CONFIGURATION :

<mda class="MailFetch.delivery.SMTPDeliveryAgent" id="smtp">


<host>localhost.localdomain</host>
<port>25</port>
<localuser>gautam</localuser>
<domain>localhost</domain>
<user></user>
<password></password>
</mda>

EXPLANATION: The localuser element defines who the email is


directed to. The domain is the domain of the local user. In this case,
the email is dispatched to gautam@localhost. user and password are
used if your server requires SMTP Authentication.
DELIVERY AGENT NAEM : NULL
DESCRIPTION : A Null Delivery Agent does nothing. So
basically equivalent to dumping into
/dev/null.
CLASS NAMES : MailFetch.filters.NullDeliveryAgent
SAMPLE CONFIGURATION : None

<mda class="MailFetch.delivery.SMTPDeliveryAgent" id="smtp">


</mda>

EXPLANATION :There is no configuration for this delivery Agent.


Please use with care, as you could very easily lose all your mails due
to a misconfiguration.

3.4 Miscellaneous Configuration

Polling: Polling time is the time Mail Fetch waits between mail
downloading sessions. For example

<poll>120</poll>

Specifies the polling time as 120 seconds (2 minutes). A non-


positive polling time indicates that Mail Fetch should just run
through the maildrop list and download messages once.

Logging: I would recommend turning logging on as it gives you a


very good idea as to what is happening in the system. All exceptions
are logged, so nothing would escape your eye. Mail Fetch does a
light-medium logging in the DEBUG state.

<log target="logs/MailFetch.log" priority="DEBUG" enabled="true"


/>

The target attribute specifies the file where Privacy-Aware


Collaborative Spam Filtering document should log all its data. The
priority attribute specifies the logging priority. Priorities of logging
are DEBUG, INFO, WARN, ERROR, FATAL_ERROR. The enabled
attribute is optional and is treated as true by default.

4. Sample configuration file

A sample configuration file is included along with the Privacy-Aware


Collaborative Spam Filtering document distribution. You will need to
customize the configuration file according to your needs and
requirements. Refer to this document to configure the file. Below is
a small configuration file to give you some idea as to how to go
about modifying the configuration.
<MailFetch>
<!-- Poll for new mail every ten minutes -->
<poll>600</poll>

<!-- Enable logging -->


<log target="logs/MailFetch.log" priority="DEBUG"
enabled="true"
/>

<!-- First the delivery agents -->


<!-- I do my delivery of mail through SMTP -->
<mda class="MailFetch.delivery.SMTPDeliveryAgent"
id="smtp">
<host>localhost</host>
<port>25</port>
<localuser>gautam</localuser>
<domain>localhost</domain>
<user></user>
<password></password>
</mda>

<!-- Mailbox delivery for suspect SPAM -->


<mda class="MailFetch.delivery.MailboxDeliveryAgent"
id="spam">
<destination>/home/gautam/Mail/spam</destination>
</mda>

<!-- Get a lot of personal mail -->


<mda class="MailFetch.delivery.MailboxDeliveryAgent"
id="pers">
<destination>/home/gautam/Mail/personal</destination>
</mda>

<!-- Global filters begin; apply to all maildrops -->


<filters>

<!-- Size filter comes first, I am on a dialup :( -->


<filter class="MailFetch.filters.SizeMailFilter"
max-size="102400" delete="true">
</filter>

<!-- SPAM blocking filter is next -->


<filter class="MailFetch.filters.SenderMailFilter"
delete="true"
blocklist="/home/gautam/MailFetch/conf/blocklist"
mda="spam">
</filter>
</filters>

<!-- My maildrops go next -->


<maildrop protocol="pop3" mda="smtp">
<host>mail.somepopserver.com</host>
<port>110</port>
<user>myusername</user>
<password>mypass</password>
<delete>true</delete>

<!-- filters specific to this maildrop -->


<filters>
<filter class="MailFetch.filters.SenderMailFilter"
delete="true"
blocklist="/home/gautam/MailFetch/conf/friendlist"
mda="pers">
</filter>
</filters>
</maildrop>

</MailFetch>

The above section is just a sample configuration file. You will need
to customize your configuration depending on what kind of filtering
meets your requirements.

5. Advanced usage

In case you find that you need some customized filtering, you may
want to write your own Filters. The easiest way to understand how
to do this is to look at the filters which are available in the Privacy-
Aware Collaborative Spam Filtering document distribution. Good
filters to start with are NullMailFilter, SizeMailFilter, SubjectMailFilter
and MessageIDMailFilter. That should cover most common uses.
Once you have written your Filter, you need to include it in the filter
configuration. In addition, Privacy-Aware Collaborative Spam
Filtering document requires it to be in the system classpath to be
able to load it. It can simply be achieved by putting the relevant
classes in a jar and putting it in the lib directory. The run scripts
loadup all the jars in the classpath.

Privacy-Aware Collaborative Spam Filtering document is an


application to download your email through protocols like POP3 and
IMAP. The decision of whether to download a mail or not is made
through a sequence of filters. By implementing certain interfaces, a
user can very easily add his/her own filter to the current set of
provided filters. Examples of filters are spam control, size
restrictions etc. Privacy-Aware Collaborative Spam Filtering
document downloads the email if it matches the criteria and then
can deliver it using one of its delivery options. One can choose to
deliver mail to a mailbox or to an SMTP Server.

Privacy-Aware Collaborative Spam Filtering document can


process multiple maildrops with individual filter mechanisms and
delivery options. Privacy-Aware Collaborative Spam Filtering
document is written in the Java Programming language and has an
extensible XML based configuration.

Configuration

TO SET UP Privacy-Aware Collaborative Spam Filtering document


FOLLOW THE FOLLOWING STEPS:

If you have the source, compile using the build. bat batch file

* Now, enter the dist directory and edit the conf/JFetch.xml file.
This is the configuration file for Privacy-Aware Collaborative Spam
Filtering document. Refer to the Configuration.txt file in the docs
directory for a detailed description of the configuration file.

* Now you can run Privacy-Aware Collaborative Spam Filtering


document by executing the run. bat file in the dist directory.

Input design is the process of converting user-originated information


to computer-based format. The goal of designing input data is to make
data entry as easier and error free as possible. An input format should
be easy to understand.

In this product inputs are nothing but messages i.e. mails. Every mail
has some properties like sender, subline, body, message-id and so on.
By taking these inputs automatically from the message, which are
inside the mailbox, we do the process to decide whether to drop the
message or not. The output design relays on input, which is used to
the output. Hence input design needs some special attention.

Output reflects image of the organization. The output design involves


designing forms layout, making lists, making well designed reports
etc., and reports are main outputs of the proposed system. Here the
outputs are : LOG FILES, which record every thing handle by the
server relevant to this project including error messages.

Databases and database management systems and explores how to


use relationships in a pool of data when developing methods for data
storage and retrieval. Databases allow data to be shared among
different applications.

Database in not used in this product. we simply record the details of


how a particular transaction is handled by the server in some log files.
We store those log files in permanent disk at specified location.

UML Diagrams
Screens
Testing

Testing is one of the most important phases in the software


development activity. In software development life cycle (SDLC), the
main aim of testing process is the quality; the developed software is
tested against attaining the required functionality and performance.

During the testing process the software is worked with some particular
test cases and the output of the test cases are analyzed whether the
software is working according to the expectations or not.

The success of the testing process in determining the errors is mostly


depends upon the test case criteria, for testing any software we need
to have a description of the expected behaviour of the system and
method of determining whether the observed behaviour confirmed to
the expected behaviour.

Since the errors in the software can be


injured at any stage. So, we have to carry out the testing process at
different levels during the development. The basic levels of testing are
Unit, Integration, System and Acceptance Testing.

The Unit Testing is carried out on coding. Here different modules are
tested against the specifications produced during design for the
modules. In case of integration testing different tested modules are
combined into sub systems and tested in case of the system testing
the full software is tested and in the next level of testing the system is
tested with user requirement document prepared during SRS.

There are two basic approaches for testing. They are

In Functional Testing test cases are decided


solely on the basis of requirements of the program or module and the
internals of the program or modules are not considered for selection of
test cases. This is also called Black Box Testing

In Structural Testing test cases are generated on


actual code of the program or module to be tested. This is called White
Box Testing.

A number of activities must be


performed for testing software. Testing starts with test plan. Test plan
identifies all testing related activities that need to be performed along
with the schedule and guide lines for testing. The plan also specifies
the levels of testing that need to be done, by identifying the different
testing units. For each unit specified in the plan first the test cases and
reports are produced. These reports are analyzed.

Test plan is a general document for entire project, which


defines the scope, approach to be taken and the personal responsible
for different activities of testing. The inputs for forming test plane are
Project plan
Requirements document
System design

Although there is one test plan for entire


project test cases have to be specified separately for each test case.
Test case specification gives for each item to be tested. All test cases
and outputs expected for those test cases.

The steps to be performed for


executing the test cases are specified in separate document called test
procedure specification. This document specify any specify
requirements that exist for setting the test environment and describes
the methods and formats for reporting the results of testing.

Unit testing mainly focused first in the smallest and low


level modules, proceeding one at a time. Bottom-up testing was
performed on each module. As developing a driver program, that tests
modules by developed or used. But for the purpose of testing, modules
themselves were used as stubs, to print verification of the actions
performed. After the lower level modules were tested, the modules
that in the next higher level those make use of the lower modules were
tested.

Each module was tested against required functionally and test cases
were developed to test the boundary values.

Integration testing is a systematic technique for


constructing the program structure, while at the same time conducting
tests to uncover errors associated with interfacing. As the system
consists of the number of modules the interface to be tested were
between the edges of the two modules. The software tested under this
was incremental bottom-up approach.

Bottom-up approach integration strategy was implemented with the


following steps.
 Low level modules were combined into clusters that perform
specific software sub functions.
 The clusters were then tested.

System testing is a series of different tests whose


primary purpose is to fully exercise the computer-based system. It also
tests to find discrepancies between the system and its original
objective, current specifications.

Privacy-Aware Collaborative Spam Filtering document


System Test Cases & System Test Report

The system test cases mentioned below are expected to work and give
the expected behaviour if the explorer is configured to run jar files as
mentioned in the project folder. The necessary library files and
standard jar files are in the appropriate project directories and the path
and classpath environment variables are appropriately set.

Tes Observe Status


C.No INPUT EXPECTED BEHAVIOUR d P=
. behaviou Passed
r F = Failed
Send a Mail with The mail should reach
size less than the destination without -do- P
1 what any hurdles
we specify in
.xml
and apply size
filter

Send a Mail with The mail should not


size more than be
what Reached to
2 we specify in destination
.xml Just becoz of size
and apply size filter mailfilter has to -do- P
delete
It.
Check the log file It should contain
3 For above two info about mail sizes
mails and what mail is -do- P
deleted

Add one more Our application


mail should
Drop in xml file Interact with the -do- P
by specified mailboxes
Adding one more and
4 maildrop tag Download all the
mails
from them
Our application
Add subject filter should the mails
in xml file by which are having
5 adding one more the subject words
filter tag What we specify in -d P
in filters tag i.e subject blocklist file o-
global
filters area.
Add subject filter Our application
in xml file by should the mails
6 adding one more which are a senders -do- P
filter tag what we specify in
in filters tag i.e sender blocklist file
global
filters area.
Add null filter in Our application
xml file by should delete all the
7 adding one more mails
filter tag Irrespective of the -do- P
in filters tag i.e criteria.
global
filters area.
Our application
8 Add Header filter should the mails
in xml file by which are a header
adding one more name is equal to
filter tag header value what -do-
in filters tag i.e we specify in xml
global file
filters area. P
Add SMTP Each and every
Delivery agent in copy of non deleted
xml and give mails
That id in the Should send
maildrop tag another copy to
9 some other user -do-
What we specify in
SMTP
Delivery agent P
Add MailBox Each and every
Delivery agent in copy of non deleted
xml and give mails
That id in the Should send
1 maildrop tag another copy to a
0 directory
-do- P
Configuring Filters

It is the duty of the Administrator to configure the filters. For this


purpose First place the our Jfetch directory in a Mail server
administrator required. After that you can find an XML file in a sub
director named “conf”. That file is easily readable by this administrator
can change the corresponding values to configure to his chosen
Mailserver. you can see the main part of that file below:

<maildrop protocol="pop3" mda="ld">


<host>localhost</host>
<port>110</port>
<user>stud2</user>
<password>pass2</password>
<delete>false</delete>

<!-- filters specific to this maildrop -->


<filters>
</filters>
</maildrop>
here you can observe we configure it to James server which is running
on POP3 protocol and which is placed in our local system at port
number 110. These filters are applied only on stud2 maildrop or
mailbox.

After configuration completed administrator have to create


mailboxes for company personnel in a Mail server using Telnet Tool
and configure those mailboxes to your local Mail client relevant to this
configuration we did it before. Open MailClient used by you and follow
the instruction given by that MailClient to configure those earlier
created mailboxes in Mailserver. At one time it is asking for to specify
incoming mail server and outgoing mail server then you have to
specify the IP-address of server in that you configured your filters
earlier. In case of MS-Outlook Express screen seems to be like this –

After that your Local mail client creates a new accounts for you
specified mailboxes. Thus you can access those mail boxed from your
local mailclient and can organize those mailboxes as you like. A part
from this configuration your installed filters worked on all the
mailboxes you specified in above configuration file here names as
conf.xml .
Privacy-Aware Collaborative Spam Filtering document is a tool, lot of
efforts were put to make it filter perfectly and efficiently. The
developed system is tested with real data and the users are satisfied
with the performance of the system and reports.

This project is developed using JAVA MAIL API, one of the J2EE
technologies, with the help of XML language. By using this tool we can
drop the unwanted mails or messages automatically by specify our
restrictions in corresponding files. By this lot of work load will be
reduced to the administrator and also a copy of deleted message can
be directed to specified location which is for verifications. This tool is
very useful for administrating department our company It provides
extendibility also. So you can add your own filters in future very simply
without disturbing the existing code. This tool reduces the manual
work. Time as well as manpower saved. The time for processing and
producing reports is considerably reduced. All the features are
implemented and developed as per the requirements.
Basic Java Concepts : Thinking in JAVA
( Bruce Eckel )
Java Mail API : Wrox Publications Volume I and II
An Integrated Approach to
Software Engineering : Pankaj Jalote
Introduction to System
Analysis and Design : I.T.Hawryszkiewycz
For UML diagrams : UML in 24 Hours Book
Some preferred websites : www.bruceeckel.com
www.sun.com/j2ee/mailapi
www.sun.com/j2se

You might also like