Professional Documents
Culture Documents
T, Aligarh
ABSTRACT
A web search engine is designed to search for information on the World Wide
Web and FTP servers. The search results are generally presented in a list of
results and are often called hits. The information may consist of web pages,
images, information and other types of files. Some search engines also mine
data available in databases or open directories. Unlike web directories, which
are maintained by human editors, search engines operate algorithmically or are
a mixture of algorithmic and human input
The web creates new challenges for information retrieval. The amount
of information on the web is growing rapidly, as well as the number of new
users inexperienced in the art of web research. People are likely to surf the
web using its link graph.
CHAPTER – 1
INTRODUCTION
Web Search Engine 1
M.U.I.E.T, Aligarh
A web search engine is designed to search for information on the World Wide Web and
FTP servers. The search results are generally presented in a list of results and are often
called hits. The information may consist of web pages, images, information and other types
of files. Some search engines also mine data available in databases or open directories.
Unlike web directories, which are maintained by human editors, search engines operate
algorithmically or are a mixture of algorithmic and human input.
Search engines are the key to finding specific information on the vast expanse of the World
Wide Web. Without sophisticated search engines, it would be virtually impossible to locate
anything on the Web without knowing a specific URL.
Search engine is a software program that searches for sites based on the words that you
designate as search terms.
Search engines look through their own databases of information in order to find what it is
that you are looking for.
“Search engine” is the popular term for an Information Retrieval (IR) system.
As the web evolves towards the direction of providing more and more information, locating
the desired information efficiently becomes a very important issue. Web search engines are
very useful information search tools in the Internet.
The System should be well equipped with many powerful features and content delivery
system that combines various services, making them an integrated unit. Unlimited users can
access the system anytime & anywhere equipped with an internet connection.
Administrator can upload the files and delete the uploaded files on the server and
also search the text and files. User is only allowed to search the text and files.
The technology which should be used to develop the Web search engine should
ensure that searching is fast and accurate; operation is hassle free with easy maintenance.
This software can be easily upgraded in the future. And also include many more features for
existing system.
In future this system will provide security all over the network.
In future user can search images, video and other types of content.
CHAPTER – 2
SYSTEM ANALAYSIS
System analysis is a software engineering task that bridges the gap between system level
requirements engineering and software design. Requirements engineering activities result in
the specification of software’s operational characteristics (function, data and behaviour),
indicate software’s interface with other system elements, and establish constraints that
software must meet. The most commonly used requirements technique is to conduct a
meeting or interview. The first meeting between a software engineer (the analyst) and the
customer can be likened to the awkwardness of a first date between two adolescents. Data
collection is done by taking the copies of the documents involved in its working from the
organization.
We were supposed develop software that can provide secure transmission. RSA is a
cryptographic algorithm designed to help users to communicate safely and provide a secure
transmission. Working in team reinstates the team for some common guidelines and
standard to be followed by all the team members across all the team. For the optimum use
of practical time it is necessary that every session is planned. Planning of this project will
include the following things:
Topic understanding.
Modular break-up of the system.
Programming of each module.
Gantt chart is a project scheduling technique. Progress can be represented easily in Gantt
chart, by colouring each milestone when completed. The project will start in the month of
February and end will at the starting May
Feasibility study is conducted to select the best system that meets the performance requirements.
This entails an identification, description, and evaluation of the candidate system, and the section of
the best system for the job. Many feasibility studies are disillusioning for both user and analyst. First
the study often pre-supposes that when feasibility of the documents is being prepared, the analysis is
in position to evaluate solutions. Second most studies tend to overlook the confusion inherent in the
system development … the constraints and the assumed attitudes. If the feasibility study is to
serve as decision document, it must answer three key questions:
• Is there new and better way to do a job that will benefit the user?
• What is recommended?
The most successful system projects are not necessary or most visible in business but
rather those that truly need user expectations. More projects failed because of inflated
expectations than for any other reasons.
The features embedded in the system are latest and according to the need of the
client. The backend used is the latest Java tools and Net beans. Any up gradations need
can be easily made in the source code, thus decreasing the headache of changing each and
every code. If in near future, the latest version of java tools and net beans to be installed
then , the source code handling the connection of the code can easily be modified. The
codes are easily compatible for the changes, as the latest version doesn't affect the core
code.
In this we determine what change will be brought in system, new skills required and other
human organization and political aspects. Each user can easily use our algo. However it is
desirable that the user has the basic knowledge of the computers.
Without making any changes in the rules and regulations of the existing system
proposed system can easily adopted.
Our Project does not infringe with known acts, status or any pending legislation. Hence it is
legally feasible.
CHAPTER – 3
SYSTEM DESIGN
The conceptual design tells ‘what the system will do? The system is described in terms of its
boundary, entities, attributes, and relationships. In the conceptual designing phase we have
considered the following questions: -
Moreover, the system is described in language that the customer can understand,
rather than in computer jargon and technical terms. For example, the customers of the
system have been told that a menu on display screen will give users access to the system
functions.
The system description may even list acceptable user responses and the actions that
may result. However, the customer is not told how the data are manipulated in the system or
what kind of techniques is used for data manipulation. At the time of conceptual design, we
have written in the client’s language, which does not contain technicalities. It describes the
functions of the systems and incorporates all requirements in adequate details.
Login
Upload
) _ci date
file_name varchar(10 latin_swedish NO select,insert,up
24) _ci date
upload_dat datetime (Null) NO select,insert,up
e date
file_size int(11) (Null) NO select,insert,up
date
file_type varchar(10 latin_swedish NO select,insert,up
) _ci date
Upload html
View_search
The technical design explains the system to those hardware and software experts
who will implement it. The design describes the hardware configuration, the software
needs, the communication interfaces, the input and output of the system and anything else
that translates the requirements into a solution to the customer’s problem. The design
description is a technical picture of the system specification. Thus we include the following
items in the technical design:
The System Architecture: A description of the major hardware components and their
functions.
The System Software Structure: The hierarchy and function of the software
components.
The data structure and flow through the system.
Introduction:
Web
CHAPTER – 4
MODELING
Admin module — Administrator can login and upload the files (such as .txt, .pdf,
.doc, .docx, .html… etc.).
A search and matching function—Search the content that user want to search.
Summarizing and presenting documents—it shows the final result to the user.
CHAPTER – 5
CODING
• Pentium 4 processor
• 256 MB RAM
• 20 GB Hard drive
JAVA
The original and reference implementation Java compilers, virtual machines, and class
libraries were developed by Sun from 1995. As of May 2007, in compliance with the
specifications of the Java Community Process, Sun relicensed most of its Java technologies
under the GNU General Public License. Others have also developed alternative
implementations of these Sun technologies, such as the GNU Compiler for Java, GNU
Class path, and Dalvik.
*Stable release -Java Standard Edition 6 (1.6.0_25) (April 21, 2011; 16 days ago)
*Influenced by -Ada 83, C++, C#, Delphi Object Pascal, Eiffel, Generic Java, Mesa,
Modula- 3, Objective-C, UCSD Pascal, Smalltalk
* Influenced- Ada 2005, BeanShell, C#, Clojure, D, ECMAScript, Groovy, J#, JavaScript,
PHP, Python, Scala
JSP- Java Server Pages (JSP) is a server side Java technology that allows software
developers to create dynamically generated web pages, with HTML, XML or other
document types. JSPs are compiled into Servlets by a JSP compiler.
SERVLET- Servlets are Java programming language objects that dynamically process
requests & construct responses. The Servlet APIs are contained in the javax.servlet &
javax.servlet.http packages. Servlets can be generated automatically by Java server Pages
(JSP) compiler.
Java Script – JavaScript is a programming language that is used to make web pages
interactive. It runs on your visitor's computer and so does not require constant downloads
from your web site.
merged together into a seamless whole. When your customer clicks on something on an
Ajax driven application, there is very little lag time.
CSS - Cascading Style Sheets (CSS) is a style sheet language used to describe the
presentation semantics (the look and formatting) of a document written in a markup
language. It’s most common application is to style web pages written in HTML and
XHTML, but the language can also be applied to any kind of XML document, including
SVG and XUL.
MySQL- MySQL is one such RDBMS. It provides a set of functional programs that we
use a tool to build structure and performs tasks, in MySQL data is stored and displayed in
tables. A table is a data structure that holds data in a relational database. A table comprises
of rows and columns. Table can also show relationship between entities. The formal name
of table is relation, hence the name Relational Database Management System.
Free_Text_Search_or.java
package searchEngine;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import javax.servlet.ServletContext;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.sql.*;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
public class FreeText_Search_or extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request,HttpServletResponse
response) throws ServletException, IOException {
response.setContentType("text/xml");
out = response.getWriter();
textstring = request.getParameter("searchText");//"Hello rit";//
fileName = rst.getString(1);
String filename = "/upload/"+fileName;
ServletContext context = getServletContext();
for(int i=0;i<arLen;i++){
if(text.contains(textarry[i])){
if(!lgfile)
{logfileName=logfileName+fileName+"#";lgfile=true;}
String resultarry[]=text.split(textarry[i]);
int len=resultarry.length;
result=result+fileName+"#~#";
System.out.println("result is1 "+result);
for(int j=0;j<len;j++){
result=result+resultarry[j]+"#!!#";
System.out.println("result is2 "+result);
}
result=result+"#~#"+textarry[i]+"#~!#";
System.out.println("result is3 "+result);
}
}
}
}
}
System.out.println("result is "+result);
out.write(result);
con.close();
} catch (Exception e) {
e.printStackTrace();
}
savelog();
}
public void savelog(){
Date curntDate=new Date();
DateFormat dateFormat = new SimpleDateFormat("yyyy/MM/dd
HH:mm:ss");
try {
con = Connect.Connect_S_Engine.makeCon();
pst = con.prepareStatement("insert into
view_search(search_text,file_names,search_date,search_type,boolean_search
_type) values(?,?,?,?,?)");
pst.setString(1, textstring);
pst.setString(2, logfileName);
pst.setString(3,dateFormat.format(curntDate));
pst.setString(4, "free text search");
pst.setString(5, "or");
pst.executeUpdate();
con.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
CHAPTER – 6
TESTING
There are following rules that can serve well as testing objectives:
A good test case is one that has a high probability of finding an as-yet-undiscovered error.
Testing against specification of system or component. Study it by examining its inputs and
related outputs. Key is to devise inputs that have a higher likelihood of causing outputs that
reveal the presence of defects. Use experience and knowledge of domain to identify such
test cases. Failing this a systematic approach may be necessary. Equivalence partitioning is
where the input to a program falls into a number of classes. E.g. positive numbers vs.
negative numbers. Programs normally behave the same way for each member of a class.
Partitions exist for both input and output. Partitions may be discrete or overlap. Invalid data
(i.e. outside the normal partitions) is one or more partitions that should be tested. Test cases
are chosen to exercise each portion. Also test boundary cases (atypical, extreme, zero) since
Web Search Engine 23
M.U.I.E.T, Aligarh
these frequently show up defects. For completeness, test all combinations of partitions.
Black box testing is rarely exhaustive (because one doesn't test every value in an
equivalence partition) and sometimes fails to reveal corruption defects caused by "weird"
combination of inputs. Black box testing should not be used to try and reveal corruption
defects caused, for example, by assigning a pointer to point to an object of the wrong type.
Static inspection (or using a better programming language!) is preferable for this.
DEBUGGING:
(possible including many returns of it) will usually take place to confirm the hypothesis. If
the hypothesis is demonstrated to be incorrect, a new hypothesis must be formed.
Debugging tools that show the state of the program are useful for this, but inserting print
statements is often the only approach. Experienced debuggers use their knowledge of
common and/or obscure bugs to facilitate the hypothesis testing process. After fixing a bug,
the system must be reset to ensure that the fix has worked and that no other bugs have been
introduced. In principle, all tests should be performed again but this is often too expensive
to do.
Testing need to be planned to be cost and time effective. Planning is setting out
standards for tests. Test plans set the context in which individual engineers can place their
own work. Typical test plan contains:
Overview of Testing Process.
Recording procedures so that tests can be audited.
Hardware and Software Requirements.
Constraints.
A strategy for software testing integrates test case design methods into a well-
planned series of steps that result in the successful construction of software. It provides a
road map for the software developer, the quality assurance organization and the customer- a
road map that describes the steps to be conducted as part of testing, when these steps are
planned and then undertaken, and how much effort, time and resources will be required.
Therefore, any testing strategy must incorporate test planning, test case design, test
execution, and resultant data collection and evaluation.
Large systems are usually tested using a mixture of strategies. Different strategies
may be needed for different parts of the system or at a stage of the process.
CHAPTER – 7
Software cost:
• Java tool kit (freeware on internet)
Manpower cost
• Profit 24000
7.1 LIMITATION
• This system cannot store information about the user for user preferences.
• This system does not povide the security for the user.
This software can be easily upgraded in the future. And also include many more features
for existing system.
CONCLUSION
Web Search Engine is an interface that provide user to search files and text over
the network.
• Search engine plays important role in accessing the content over the internet, it
fetches the pages requested by the user.
• It made the internet and accessing the information just a click away.
• The search engine sites are among the most popular websites.
REFERENCES
[3] Simon Brown, Sam Dalton, Daniel Jepp, Dave Johnson, Sing Li and Matt Raible, “Pro
JSP 2” Apress, 4th Edition
[5] Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage
Data. Springer, ISBN 3540378812
[6] Bar-Ilan, J. (2004). The use of Web search engines in information science research.
ARIST, 38, 231-288.
[7] “Guidelines From”,
• www.google.com,
• www.wikipedia.com
• www.ieee.org,
• www.sciencedirect.com
[8] Ross, Nancy; Wolfram, Dietmar (2000). "End user searching on the Internet: An
analysis of term pair topics submitted to the Excite search engine". Journal of the American
Society for Information Science 51 (10): 949–958.
[9] Xie, M.; et al. (1998). "Quality dimensions of Internet search engines". Journal of
Information Science 24 (5): 365–372.
[10] Information Retrieval: Implementing and Evaluating Search Engines. MIT Press.
2010.
Forth : go to http://localhost:8080/S_Engine
Username: admin
Password: admin
Delete Page