Professional Documents
Culture Documents
Faculty of Engineering
Computer Science and Automatic Control Department
B.Sc. Graduation Project (2004/2005)
Supervised by:
Presented by:
Nothing can be said but "She is the most kind and caring person we have ever
seen".
Dr Amr El Masry
who:
Was the main reason behind our love and passion to algorithms
Helped us analyze and understand the problem well
Never underestimated or offended any of our thoughts
Eng. Hassan Kamel Tosson (president of the Central Department of Information and
Computer Systems Egyptian Railways)
who:
along with his staff, answered every question about the existing system and
even took us in a walk through the whole system.
i
Summary
This project aims at designing and implementing Web enabled services for railway
networks.
The main feature our system offers is a query facility, which gives all the possible
journeys from a departure station to an arrival one. These journeys are not only the direct
train trips, but also the indirect journeys that consist of more than one direct train trip,
which may be more suitable for the user over the counterpart direct trips. The journeys
proposed by the system are optimized for time under multi-constrains such as the
departure date and time, the classes and the maximum number of train exchanges.
Implementing such a feature involves a multi constrained graph search. It may seem a
simple graph search problem, but in fact, the complexity of the problem is due to the
multi-edged railway network that is constrained by traveling schedules.
The system provides a reservation facility that increases the utilization of each seat in a
train, by choosing the seat to reserve using some heuristics, not randomly. The system
provides also online database administration tools that facilitate the data entry process for
the employees.
Developing and deploying this project requires considering the existing railway system,
making minimal assumptions and achieving high performance and reliability. We have
proposed a four-tier system architecture: a database server tier, an application server tier
that isolates the algorithm complexity and allows algorithm reuse, a web server tier that
stores the Web pages and enables multi-client access to the system and a thin client tier
that is composed only of a Web browser.
ii
Contents
Summary
Acknowledgement
1. Introduction………………………………………………………………………1
1.1. Motivations…………………………………………………………………...1
1.2. Past and Present systems…………………………………………………...…1
1.3. Objectives of the project………………………………………………...……1
1.4. Organization of the report……………………………………………….……2
2. Background…………………………………………………………………….…3
2.1. Different System Architectures……………………………………………….3
2.2. Java Server Pages Technology……………………………………………..…4
2.3. Overview of some searching strategies…………………………………….…5
2.4. Railway network terminologies………………………………………………6
6. Reservation …………………………………………………………….……….54
6.1. Reservation Problem……………………………………………….…..……54
6.2. Available Techniques………………………………………………..………55
6.3. UML Class Diagram…………………………………………...……………56
6.4. Pseudo Code…………………………………………………...…………….57
iii
7.8. Data Recovery………………………………………….……………………84
7.9. Administration Tools……………………………………………..…………85
References…………………………………………………………….……………..88
iv
Chapter 1
Introduction
1.1 Motivations
Choosing a train journey from Alexandria to Cairo is not difficult, because there are many
direct trains. But when trying to go from Sohag to Fayed, for example, it gets really annoying,
because there are many trains passing by both cities but with no direct connection. The user
may choose to take a direct train to an intermediate main city, like Cairo, and then ask for the
next train for Fayed. Unfortunately the user may not find a free seat on that train and may have
to wait for a long time to catch another one. This project aims at designing and implementing a
Web enabled system that solves such kinds of problems.
Reserving a seat in a train is an easy task if all the passengers reserve their seats for the whole
trip, but it is not always the case. Some passengers reserve seats for a certain segment in the
middle of the trip, leaving these seats free in the other segments. Choosing the seat to reserve
using some heuristics, not randomly, will increase the utilization of each seat in a train. The
proposed system offers a reservation facility to solve this problem.
Finally, the data entry process may be a tiring and an exhaustive process for railways
employees, unless it is well organized. Providing employees with database administration tools
that are user friendly and well documented will facilitates the process.
Past railway systems in Egypt were non-computerized. The only way to get information about
trains and their schedules was to ask an employee; for an employee it is difficult to provide all
the possible detailed train journeys that can connect two stations, especially when there are no
direct connections between them.
Although present railway systems are computerized, like the Virtual Machine Environment
(VME) system that is used internally in the National Organization for Egyptian Railways, they
are not Web enabled. We cannot ignore that some Web sites that provide railway information
exist, but they only provide static information. For example www.touregypt.com Web site
contains only static tables for train schedules. Unfortunately, users may lose many better
solutions for journeys between cities with no direct railway connections between them. This
approach has not yet been implemented for Egypt’s railways.
The project aims at designing and implementing an interactive Web enabled system that
supports querying and reserving train journeys between two cities. Online payment is out of
our project’s scope. The proposed system finds all feasible railways connections with a
reasonable cost between two cities ranked by arrival time. Although the project targets
1
Egyptian Railways, it can be used to serve other railway network as well as similar
transportation network.
The proposed system is developed to meet the requirements of passengers who need a fast and
efficient facility for querying and reserving a railway journey between two cities which have
no direct train trips between them. Egyptian Railways can use this system to improve its
service, and to decrease the load on information-desk clerks at stations.
Chapter 1. Introduction: This chapter introduces the project motivations, past and present
systems, the objectives of the project and the report organization.
Chapter 2. Background: presents the different system architectures, Java Server Pages (JSPs)
technology, some graph search techniques and some terminologies.
Chapter 3. Statement of the problem: defines the railway network, shows the problems of the
routing algorithm, and presents our system architecture and implementation environment.
Chapter 4. Suggested Routing Algorithms and their drawbacks: discusses the different
approaches invoked before settling on the final solution.
Chapter 5. Proposed Routing Algorithm: discusses the details implementation the routing
algorithm.
Chapter 6. Database Analysis and Design: discusses the details of designing and implementing
the underlying database system in our project.
Chapter 7. Reservation: discusses the available strategies for solving this problem, presents the
selected strategy and shows the design and the implementation of this strategy.
Chapter 8.Conclusion and future work: shows our conclusion and the suggested future work.
2
Chapter 2
Background
This chapter introduces some topics that are related to the project, like the different
system architectures, Java Server Pages (JSPs) technology, some graph search
techniques and some terminologies.
The minimal configuration of a Web application is the one so-called two tiers
architecture, shown in figure 2.1, which closely resembles the traditional client-server
model. The only difference from client-server model is that in the two-tiers solution
clients are thin (browsers only), i.e., they are lightweight applications responsible
only for presentation. Web pages, application logic and data are on the server side.
In fact embedding the Web pages in the application, as well as binding the application
with the data, are ugly design decisions. The last thing a data manager needs to see is
the application logic and the last thing the application developer cares about is the
Web page. It’s clear that some sort of discrimination needs to be done.
A more advanced configuration, shown in figure 2.2, separates the application logic
from data, introducing the model so-called three tiers architecture.. But still a further
improvement can be done.
Here comes the even more advanced configuration so–called four tiers architecture
(figure 2.3), which separates the Web pages from the application and resides them on
a separate web server.
3
Figure 2.1: Two tiers architecture
4
Java Server Pages are made operable by having their contents (HTML tags, JSP tags and
scripts) translated into a Servlet by the application server. This process is responsible for
translating both the dynamic and static elements declared within the JSP file into Java
Servlet code that delivers the translated contents through the Web server output stream to
the browser.
Because JSPs are server-side technology, the processing of both the static and dynamic
elements of the page occurs in the server. The architecture of a JSP/Servlet-enabled Web
site is often referred to as thin-client because most of the business logic is executed on the
server.
The following process outlines the tasks performed on a JSP file on the first invocation of
the file or when the underlying JSP file is changed by the developer:
• The Web browser makes a request to the JSP page.
• The JSP engine parses the contents of the JSP file.
• The JSP engine creates temporary Servlet source code based on the contents of the
JSP. The generated Servlet is responsible for rendering the static elements of the JSP
specified at design time in addition to creating the dynamic elements of the page.
• The Servlet source code is compiled by the Java compiler into a Servlet class file.
• The Servlet is instantiated. The init and service methods of the Servlet are called,
and the Servlet logic is executed.
• The combination of static HTML and graphics combined with the dynamic
elements specified in the original JSP page definition are sent to the Web browser
through the output stream of the Servlet's response object.
Subsequent invocations of the JSP file will simply invoke the service method of the
Servlet created by the above process to serve the content to the Web browser. The Servlet
produced as a result of the above process remains in service until the application server is
stopped, the Servlet is manually unloaded, or a change is made to the underlying file,
causing recompilation.
5
Depth First Search "DFS"
DFS extracts the front element from the OpenList and inserts its successors in the front of
the OpenList. It is similar to pre-order traversal.
Uniform Cost Search
The Uniform Cost search is a bit more pensive about how it choose nodes for exploration
in searching for solutions.
It extracts the Least-Cost element from the OpenList and inserts its successors, with their
associated costs, in the OpenList.
Informed Search Strategies use a heuristic function, h (n) that estimates cost of cheapest
path from node n to the goal
Greedy extracts the node with the least expected cost h (n) from the OpenList, and inserts
its successors, with their associated expected costs, in the OpenList.
A * (A Star)
It combines two costs
• f(n) = g(n) + h(n)
– g(n) = cost to get to n from start
– h(n) = estimated cost to get from n to goal
Similar to others it extracts the node with the least combined cost f (n) from the
OpenList, and inserts its successors, with their associated combined costs, in the
OpenList.
6
Chapter 3
Although the railway lines represent a static simple graph, the set of stations, train trips
and their schedules do not. At a certain time or date some trains are available, others are
not. Thus the corresponding graph is a dynamic one (time variant).
To represent the railway network with a single-edged graph, each station is represented
by many hubs. A hub is a train at a certain station at a certain time. This approach
eliminates the multi-edged property and the self loops, but there is still the problem of the
time constraint, which adds the difficulty that not all these edges are valid at any time.
Also, each edge of these edges can not be assigned a cost because some of the constraints
do not depend only on an edge between two stations, but also depend on the previous
choices of edges, as it will be declared in the next section.
Some of the metrics that characterize any journey are: period of the journey, cost,
departure and arrival times, number of stops, number of train exchanges in the journey,
7
type of these trains and many more metrics. Using these metrics to construct a weighted
edge between each couple of hubs is difficult, because some of these metrics do not
depend only on an edge between two stations, but also depends on the previous choices
of trains, like the number of train exchanges in the journey, this number depends on the
previous choices of trains so it can not be assigned to a certain edge in the graph as it
does not depend on the couple of hubs.
Some of these metrics do not follow the transitivity rule, like the cost. Imagine a trip that
passes three hubs in that order a, b and c. The cost between the hubs (a,c) is some time
less that the sum of the costs between (a,b) and (b,c). For this reason it is difficult to
optimize the journey with respect to the cost.
As a result of these reasons, the arrival time metric is chosen to be the objective of our
optimization problem, and the other metrics are constrains on the journeys. The system
cannot ask the passenger to limits to all the set of constraints. So in order to decrease the
number of solutions that the passenger will choose from, the passenger will be asked
some questions. The questions are: the preferred departure date and time, the preferred
classes and the maximum number of train exchanges. By answering these questions some
of the solutions are discarded and the other solutions are displayed sorted by their arrival
time. As the numbers of solution can be exponential, the solutions are displayed in
batches. The user can change the batch size as required.
8
Each tier consists of some modules as shown in figure 3.2.
Reservation Reservation
pages Algorithm
Employee Database
Data manipulation
Adminstration bean
page
Client tier:
It consists only of a browser (thin client), which is used by the user and the
employee.
9
3.4 Implementation Environment
In this section, the implementation environment components are presented. These
includes the database management system (DBMS), the application sever and the web
application technology. The selected tools are:
• DBMS: Oracle 8i
• Web application server: Apache Tomcat Version 4.1.12
• Web application technology: Java Server Pages (JSP)
• Java IDE: Oracle JDeveloper
We had to choose whether to use a DBMS, or to rely on the traditional file processing
approach. We chose the former approach (using DBMS), and the following comparison
will explain the reasons behind our choice.
• Inconsistency problems: over time, it is possible that the database may contain
two different values for the same data item in two different places, because some
update operation did not catch all of the places that need to be changed.
• Data isolation problems: it is not easy in such a system to pull together a report
containing all the information stored on one particular entity, since it is scattered
over many files.
• Security: In a file processing system, security must be done on a file by file basis:
any user having access to a file has access to all the fields in it.
• Integrity Constraints: often, the values of certain items in a database are logically
constrained to only certain possibilities. It is desirable for software that modifies
such an item to ensure that the new value obeys the appropriate constraints. This
is difficult since each program that accesses the data must know and apply the
constraints.
10
A database management system approach breaks the tight coupling between application
programs and data, by putting a software layer in between:
Users
Application Programs
DBMS
Actual data files
Application programs that need data do not get it directly from the files where it is stored,
but rather from the DBMS, which in turn gets it from the file. Application programs are
not allowed to access the data directly.
In addition, the database contains META-DATA, data about the data, which takes the
form of a data dictionary, which contains a standard name for the data item which
application program uses to access it, in which file the data item is stored, security
constraints and integrity constraints.
The DBMS is responsible for the concurrency control; it can ensure atomicity of
transactions. The DBMS can allow multi-user access to the data by managing accesses in
such a way to prevent inconsistency.
Java Server Pages simplify the delivery of dynamic web content. They enable Web
application programmers to create dynamic content by reusing predefined components
and by interacting with components using server-side scripting. JSP technology can run
on many Web servers and application servers, including the Sun ONE Application
Server, Microsoft’s Internet Information Services (IIS), Apache HTTP Server and IBM’s
WebSpeher application server.
11
Chapter 4
4.1 Flooding
The flooding approach to solve the problem follows no criterion in selecting the next
journey to continue in, but it is helpful in finding all possible solutions and comparing
them with our proposed algorithm. It has asymptotical exponential order in both time
and space. We also implemented this approach in our prototype.
Main methods
Function name:
runPrototype
Input:
String from
String to
Date depDate
Time depTime
int classType
int maxExchanges
Output:
Array of DetailedJourney
Pseudo code:
• Connect to the database.
• Get the serial number of the source and destination stations.
• Run the flooding algorithm by calling the function runAlgorithm.
• Convert the output of the function runAlgorithm into an array of
DetailedJourney.
• Return the array of DetailedJourney.
12
Function name:
runAlgorithm
Input:
int sourceStation
int destinationStation
Time departureTime
Output:
Array of Journeys
Pseudo code:
• Connect to the database.
• Get all the hubs of the source station that their departure time is after the
departureTime required by the user.
• For each hub
• Construct a journey with a FROM_WAIT state that begins with this
hub.
• Enqueue this Journey to a queue.
• While the queue is not empty
• Dequeue a Journey.
• If the Journey’s state is FROM_MOVE
o S Å the station of the last hub in the Journey.
o Get all the hubs of S that their departure time is after the
currentTime of the Journey.
o For each hub
- Construct a new journey with a FROM_WAIT state that
contains all the hubs of the Journey and this hub.
- Enqueue this new journey to the queue.
• Get the next hub to the last hub in the Journey.
• If there exist a next hub
o Add this hub to the Journey.
o Make the state of the Journey FROM_MOVE.
o If this hub is one of the hubs of the destination station
Add this Journey to the output array of Journey.
o Else
Enqueue the Journey in the queue.
• Return the output array of Journey.
13
4.2 Journey-wise approach
The nature of our problem forces multiple constrains on the route that the passenger
could take from source to destination, we noticed that the main character of a certain
journey is its exchanges, (i.e. leaving a train and taking another one at some station).If
we could specify the exchanges in a journey, that means we had solved the problem.
This approach takes this point in consideration.
The main feature of this approach is that it beholds the network with respect to train
journeys. So we can call it journey-wise approach.
The term “trainJourney” here represents a record carrying the stops, prices, time
schedule (e.g.departure and arrival times, which days of the week it runs in, and what
are the off days in the year) for a specific train journey.
Each station should have a bit stream, where each bit in the stream represents a
specific trainJourney.
For example: Banha station has a bit stream 0000101101.
This means: trainJourney 5, 7, 8, 10 pass by Banha station while the rest don’t.
Now, if we AND the bit streams of 2 stations we could find which trainJourney
passes by the two cities. We call this procedure (i.e. the ANDing procedure) the
connection.
Thus if we begin with Banha station as a start point at a certain time we could find
connections with all stations. Thus we could construct a table with the trainJourney
we may take from the start station to each station and their corresponding times. We
will call this a step (i.e. finding the connection from the source station to all station
under certain schedule constraints).
If we could develop a dynamic algorithm that traverses the stations with respect to
steps, the required algorithm's rule is to determine the order of steps to make.
The main advantage here is that, before each step we could always compare between
connections that lead to it and discard useless and redundant ones. It is a mean of
clustering.
One proposed search strategy (here a search strategy means, selection order of steps)
is Dijkstra algorithm. The problem of Dijkstra algorithm is that it provides only one
solution but our output should contain more than one alternative. Also Dijkstra needs
the costs on edges to get the station with the minimum cost, which we will make the
next step from.
The problem now is choosing the station to make a step from it. As the order of
stations is the only guarantee of finding minimal solution.
Other hints
1- Number of solutions at any station, at anytime is less than some polynomial
function of the number of stations, in order to guarantee a certain complexity of the
algorithm.
2- Throw the sink nodes that will never be a part of our solution.
A proposed method:
By logical thinking of the problem, we could see that the next step should be from the
station where the trains reaches first.
So, we will construct an openList, firstly it will contain the source station at the given
departure time.
14
The algorithm repeats the following process: extract the station with the minimal
departure time from openList, make a step from it and add all resulting connections to
the openList. It keeps looping until the extracted station is the destination itself. It
would be the first solution. If we continue extracting and making steps we could find
the following solutions.
Let us assume the passenger clones himself and takes all the trains leaving from the
source ( i.e. make a step from the source) so he will reach all possible destinations at a
certain time. The first clone of him that reaches any destination will clone himself
again and make a step. And so on. It is simply “Dijkstra algorithm". Where the metric
is the time. But without throwing any solution.
Simply, It is a sweep line over time, we always extract the station with minimal time,
so at any given time we are sure we already extracted all reachable connections so far,
then if we could reach destination in two feasible times T1 and T2 and T1 is less than
T2 we are sure we extracted T1 and all stations leading to it before extracting T2 that
means we will find T1 first.
That is why this algorithm will reach the destination surely in the minimal time (get
the minimal time)
So it could be terminated when the next STEP becomes the destination it self
By now we could find the Minimal solution.
To find more solution (may be cheaper or with fewer connections) we could do one of
the following approaches:
1- Continue the algorithm until the sweep line reaches a multiple times of the
minimal solution (i.e. if minimal duration is 1.5 hours we could find all
possible solutions to twice this duration (3 hours))
2- Make a similar algorithm with a sweep line over cost (money). And find
solution with minimum financial cost, now we have 2 solutions : A(
minimal_time , A_cost) and B( B_time, Minimal_time )
If we run time sweep line till B_time and cost sweep line till A_cost. We will
get all solutions. Which are not worst than A, B the rest of solutions must be
worst than both of them.
A B C D E
15
A1 B1 C1 D1
5:00 6:00 7:00 8:00
A2 C2 E2
4:00 6:00 8:00
A3 E3
2:00 9:00
B4 D4 E4
6:30 7:00 7:30
Train# 1 2 3 4
A 1 1 1 0
B 1 0 0 1
C 1 1 0 0
D 1 0 0 1
E 0 1 1 1
Table 4.1: bit stream representation of each station in the example
Trace:
For going from A to E, we begin by
Step 1:
ANDing the string of A with each one:
B { (1,”6:00”) }
C { (2,”6:00”) , (1,”7:00”) }
D { (1,”8:00”) }
E { (2,”8:00”) , (3,”9:00”) }
A{}
B{}
C{(2, ”6:00” ) , (1, ”7:00” ) }
D{(1,4, ”7:00” ,wait=0:30) , (1,”8:00”)}
E {(1,4, ”7:30”, wait= 0:30 ) , (2,”8:00”) , (3,”9:00”) }
A{}
16
B{}
C{(1, ”7:00” ) }
D{(1,4, ”7:00” ,wait=0:30) , (1,”8:00”)}
E {(1,4, ”7:30”, wait= 0:30 ) , (2,”8:00”) , (3,”9:00”) }
Step from C(1, “7:00”) cancel it as last step was from C too (delete (1, ”7:00” ))
Step from D(1,4, ”7:00” ,wait=0:30)
A{}
B{}
C{}
D{ (1,”8:00”)}
E {(1,4, ”7:30”, wait= 0:30 ) , (2,”8:00”) , (3,”9:00”) }
Destination is minimal time = “7:30” end algorithm
Solution =
Source Destination intermediate trains wait
A E A,B,D,E 1,4 0:30
A E A,C,E 2 0:0
A E A,E 3 0:0
Complexity analysis:
N : number of stations
M: number of trains
In time domain:
Since we don’t repeat steps Æ we have at most N steps (steps < M* N)
Each step involves N “anding” operationsÆ N^2
Each “anding” operation involves M bitsÆO(M*N^2)
So the total order O(M^2 * N^3)
In space domain:
We guarantee that for every station “S”, we store only one entry for each
distinct arrival time; if a tie happens (two ways reach “S” in same time) we
choose only one of them based on some other criteria like (number of train
exchanges or waiting time or cost….etc), so for every station there will be at
most M entries, so total space is O(M^2 * N^2)
17
The previous assumption is not acceptable, consider this example:
B1
1:00
A1 B2 C
0:00 1:30 2:00
A2
0:30
We can't discard one of the two solutions we should provide them both to the user
That is why this approach cannot be implemented because it discards solution to keep
the space complexity of polynomial order in number of hubs, or it will keep all
solutions but the space would be exponential.
This problem is solved in our implemented approach using two phase mechanism; the
first phase builds a graph in which the solution are implicitly kept, and the second
traverses it to get all solutions.
18
Chapter 5
19
The following diagram illustrates this idea:
A
A
B C
B C
D D D
E F
E F E F
G
G G G G
Figure 5.1: shows the difference between replicating nodes in the open list and only adding parent pointers
Every generated hub has a pointer or two (maximum) to its parent(s), the graph is
constructed in this way. Using this graph we are capable tracing all the possible routes from
this hub up to the source.
Why we need only two parent pointers?
Because any hub can be generated either through being the next stop in the train of its
parent or through being the next train in the same station of its parent. (i.e. it will have two
parents at most).
Why do we use backward not forward pointers?
Because the traversal should be done backwards. (i.e. from the goal hub to the source hub),
in order not to waste time in misleading paths (those who reach no goal) which may happen
if forward pointers is used instead. What we are saying that, backward pointers will
eventually lead to the source hub unlike forward pointers which may reach no goal.
A
A
B C
B C
D
X X
X D X
E F
E F
G
G
Figure 5.2: Difference between forward and backward pointers
20
UML and pseudo code of routing algorithm
class GAlgorithm {
constructor(Buff b) {
SÅnew Stack() to be used in the traversal method
this.bÅb
}
boolean checkEnd() {
if (OL is empty) {
return true
}
return false
}
The algorithm terminates when the open list is empty
int validateAndAddIfValid()
{
ArrayList tempÅnew ArrayList() to be added in the buffer
temp.add(S.elementAt(0))
for ( i=1 to S.size()-1)
{
21
if (station of hub i differs from that of hub i+1 or hub i-1) which means it is not an
exchange
temp.add(S.elementAt(i))
}
temp.add(S.elementAt(S.size()-1));
countedExchangesÅ0
iÅ0
while (countedExchanges less than or equal permitted exchanges and i less than
temp.size()-1)
{
if (station of hub i is the same as that of hub i+1)
increment countedExchanges
increment i
}
if (countedExchanges is greater than permitted exchanges) {
discard this solution and return 0
insert this solution in the buffer
return 1
}
The validateAndAddIfValid method checks the validity of the solution in the sense that it
does not contain train exchanges more than the permitted and compresses the waiting
period in the same station. Finally it adds the solution in the buffer if it is valid.
Method checkLoop checks if the new hub to be added does not introduce loops (traverses
the same station twice) as this solution has a time equivalent and more economic
counterpart as a waiting in the station that is traversed twice in the first solution.
int traverse()
{
noOfPathsÅ0
while (the stack is not empty) {
hÅ S.pop( )
if (h.station( ) is the source station)
{
S.push(h)
noOfPaths+=validateAndAddIfValid()
S.pop()
}
else
22
if (h.goLeft is true and checkLoop(h,h.parent1)) {
S.push(h)
S.push(h.parent1)
h.goLeft Åfalse
}
else
if (h.goRight is true and h.parent2 is not null and checkLoop(h,h.parent2)) {
S.push(h)
S.push(h.parent2)
h.goRightÅfalse
}
else {
h.goLeft Åtrue
h.goRight Å true
}
}
return noOfPaths
}
The stack initially contains the goal, the method traverse is used to get all paths from this
goal to the source through the constructed graph. It is an iterative implementation to the
depth first search; it is implemented iteratively because we need to keep track of the path
itself (i.e. hubs in the stack).
goLeft and goRight are flags to indicate whether traversing the parent1 and parent2 link is
allowed.
case 4:
runAlgorithm(new Heap(), algType) in A* the open list is a heap
}
}
23
The method respondToNewRequest is used to decide the type of the open list, whether it is
a simple queue or a priority queue (heap). The blind search techniques like BFS and DFS
use a queues while the greedy algorithms like A* and Uniform Cost use a priority queue.
while (!checkEnd()) {
h ÅOL.extractFront()
remove h from open list's hash table
if A* put it in the closed list hash table
if (h.station() is the destination) {
a solution is found
S.push(h)
countAddedSolutions += traverse() which traverses(stack containing real goal h)
continue
}
h1 = connector.getNextHubInTheSameTrain(h)
24
5.2 Comparison between search strategies
The general algorithm is implemented in a way that enables changing the search strategy
(the order in which the nodes are expanded). The search strategy affects the overall spatial
and temporal complexity. The different strategies we discuss and implement are as follows:
Blind techniques: breadth first search (BFS) and depth first search (DFS)
Best first techniques: Greedy and A* (pronounced "A star") (informed)
Uniform Cost
Note: the term cost, time and time cost are used interchangeably because our cost metrics is
the time (the optimal path is the one that reaches the destination earliest).
The blind techniques are so called because they do not prefer nodes to expand over others
in any logical manner. The BFS expands the nodes in a level wise manner, i.e. it expands
the shallowest unexpanded node. The DFS follows a preorder manner in expanding the
nodes, i.e. it expands the deepest unexpanded node. Both of these techniques are not
appealing because they are not optimal (i.e. they do not always find the shortest path to the
goal). The Greedy technique is a special case of the best first techniques; it expands the
node that is expected to be closest to the goal (the estimation of the time to reach the goal is
calculated through an estimation function) (H(n)). In graph structures with loops the
Greedy technique may get stuck, i.e. it is not optimal, but in our model there are no loops
(as each node when expanded generates two nodes with advanced time attribute, so no
node can reinsert its parent), so the Greedy technique is complete (i.e. it finds a solution if
any exists). Like BFS and DFS, the Greedy technique is not optimal; it can reach goals
through a path worse than the optimal. The A* is another special case of the best first
techniques, its evaluation function (F(n)) that calculates the desirability of the nodes adds
the time cost so far (i.e. from the source to the current node) (G(n)) to the estimate to
destination to avoid the paths that are already expensive. It is complete but its optimality
depends on the estimation function. The estimation function needs to be admissible (i.e.
underestimates the true time cost) for the A* to be optimal. The Uniform Cost technique
expands the least cost unexpanded node, it uses no estimations, but it uses the time cost so
far. It performs like a time sweep line which was discussed before in the journey wise
approach section. It is complete and optimal.
As we can see, all the techniques are complete, that is because the nature of the
GAlgorithm does not allow child nodes to regenerate parent nodes, as the children are after
the parents in time.
Concerning optimality the admissible A* and the Uniform Cost are the only optimal
techniques. So our comparison will concentrate on them.
As we said before a hub is a train in certain station and a certain time, every node
represents a hub. When a hub is expanded (removed from the open list) the two children it
generates are the next hub (stop) in the same train, and the next train from the same station.
Thus every hub can have at most two parents only.
As we explained before by figure 5.1 the proposed solution avoids adding the same node in
the open list more than once by adding only a link to its new parent (the one that will cause
its insertion again). In this scheme the only condition not to replicate a hub, is that when it
is to be inserted for the second time, it shouldn't have been expanded yet (i.e. still in the
open list).
25
Although it might seem that the A* expands less nodes to reach the goal through the
optimal path, it suffers from another problem that may counter its appealness. The problem
is that there is no guarantee that the node that is extracted from the open list (expanded)
will not be inserted in it (regenerated) again.
Consider the following case:
B1
G=1:00
H=2:00
F=3:00
A1 B2 C
G=0:00 G=1:30 G=20:00
H=10:00 H=2:00 H=---
F=10:00 F=3:30 F=---
A2
G=0:30
H=10:00
F=10:30
From A1, the next station in the same train (trip) is B1, while the next hub in the same
station is A2 (i.e. the next train that departs from station A after 0:00 o'clock)
When the GAlgorithm runs, the first hub to be inserted in the open list will be A1
(OL={A1} CL={}), which when extracted generates A2 and B1 with time costs 10:30 and
3:00 respectively.(OL= {A2,B1} CL={A1}). The next hub to be extracted is B1 (OL=
{A2,B2} CL={A1,B1}). Then extract B2 (OL={A2,C} CL={A1,B1,B2}). Then extract A2
(OL= {B2,C} CL={A1,B1,B2}), here is the problem, do we add B2 again in the OL? If we
do so the time complexity can grow to an exponential order in the number hubs. But an
important thing to note is that all the nodes in all the paths that lead to the optimal goal, is
eventually expanded before the goal is reached, this is due to the admissibility of the A*
estimation function. So all paths to a goal will be established before the goal is reached and
the graph is traversed back to find those solutions. This means that it will never be too late
to add a parent pointer from B2 to A2, as it will never be needed in traversal before
establishing it. So the solution to the proposed problem is search for a hub identical for the
one to be inserted, not only in the open list but also in the closed list, and whenever it is
found, just add a link from it to its parent. This search is not so expensive if we used a
suitable data structure, the most efficient structure for such a situation is a hash table.
It is important to mention that searching in the closed list is needed in all search strategies
except the Uniform Cost, as the previous situation cannot happen using the Uniform Cost
search strategy. That is because it operates like a time sweep line (i.e. the parents are
always inserted before their children).
26
5.3 Time and space complexity analysis
The problem is that there is no guarantee that all paths to a goal will be established before
the goal is reached and the graph is traversed back to find those solutions. We traverse the
graph and produce solutions whenever a goal is reached, as we need to be able to produce
some solutions before the graph is completely established (see Producer-Consumer
paradigm). For the previous example on extracting A2 and inserting B2 in the open list, a
goal might have been already reached and the paths to it already obtained (i.e. we will not
traverse the graph back from this goal again), so adding a link from B2 to A2 will not do,
and the solutions that paths through this link will not be discovered. So to be sure that all
possible solutions are discovered B2 must be added in the open list again, not just linked to
A2. The fact that a node (hub) may be inserted in the open list more than once causes the
time and space complexity to be of exponential order in hubs number, as the graph may be
transformed into a tree with repeated nodes. See figure …
Note : in general DFS needs a linear space for its open list, but in our case the nodes that is
removed from the open list is not removed from the memory, but they are still resident in
the constructed graph, that is why the space complexity is still exponential in the number of
hubs like that of BFS.
Greedy technique
Beside that it is not guaranteed to find the optimal path, it also suffers from the same
problems as BFS and DFS, so its time and space complexity is the same. But in the real
time analysis, it may behave better than BFS and DFS as it is more informed (not blind).
Since all paths to a goal will be established before the goal is reached and the graph is
traversed back to find those solutions (A* uses admissible heuristic and Uniform Cost is a
sweep line on time). There is no need to reinsert nodes in the open list; we could just add a
link in the constructed graph, which means that the graph size can never exceed the number
of hubs in the database. So the space complexity is linear in the number of hubs.
Concerning time complexity, on inserting each hub we need to search for it in the open list
or (open list and closed list) for Uniform Cost and A* respectively. Using a good hashing
technique for the search, the total time-complexity will be the product of number of hubs
and the order of searching a hash-table of size equals the number of hubs added to the order
of insertion in the open list. The upside of Uniform Cost over A* is that the A* needs more
data from the database to evaluate the estimate time cost from a current hub to the
destination, and A* needs to search in the closed list too. While the upside of A* over
Uniform Cost is that for large graphs the A* performance (run time) will be better as does
not waste time in misleading routes, as it uses a heuristic (estimate of time cost to
destination).
27
5.4 Implementation of the comparator
To compare between different search strategies, the algorithm takes two parameters: the
open list and the algType that specifies the evaluation function. The open list can be a
priority queue (heap) or a queue with insertions at head (QueForDFS)(i.e. a stack), or a
queue with insertions at its tail (QueForBFS) (FIFO queue). The heap is used in the
Greedy, Uniform Cost and A* strategies, while the stack is used in DFS, and the FIFO
queue is used in BFS. The order of extracting the front (which is the minimum), as well as
the insertion is logarithmic in the number of elements in the heap (hubs). The order of
extracting the front or insertion in the stack or the queue is constant.
Conclusion:
Search strategy Open list Evaluation Space Time
function complexity complexity
BFS Queue 0 2^n 2^n
DFS Stack 0 2^n 2^n
Greedy Priority queue H(x) 2^n 2^n * lg 2^n
(heap) = n*2^n
Uniform Cost Priority queue G(x) n n * lg n
(heap)
A* Priority queue F(x)=G(x) + H(x) n n * lg n
(heap)
n: number of hubs
x: the current hub
G(x): true time to reach station of x from the source
H(x): estimate of time to reach the goal from station of x
Note: lg(n) is the order of insertion or extraction front (minimum) from a heap (priority
queue).
28
UML and pseudo code of comparator
The following UML shows that QueForBFS, QueForDFS and Heap are all OpenList
implementations, this is done through inheritance as shown. QueForBFS and QueForDFS
have a common implementation for method extractFront, that is why it is implemented in
GQueue which they extend.
29
• OpenList Class
The abstract class OpenList is used to act as a base class for the Heap and GQueue
classes. It has an ArrayList to hold hubs harr (stands for hubs’ array). It enforces its
derived classes to implement methods enqueue(Hub) and extractFront(). It has a
default implementation for method enqueue(h, parent of h) where it uses the
abstract method enqueue(Hub) and the addParent(Hub) method of class Hub.
addParent(index, parent) {
harr.get(index)).addParent(parent)
}
30
• Heap Class
The class Heap extends abstract class OpenList, it represents a priority queue where
every element at its front must always be the minimum value element. The time
complexity to enqueue a new element, or to adjust the heap after extracting the
element at its front, is O (log n) where n represents the number of elements in the
heap. Beside the adjustHeap, enqueue and extractFront methods implemented in the
heap, method searchfor is inherited from class OpenList, this method is used to
indicate whether an element already exists in the hub before its addition in order not
to allow replicated elements.
31
Hub extractFront() {
if (harr is not empty) {
temp Å harr.get(0)
harr.set(0, harr.get(harr.size()-1)) copy the last element at the root
harr.remove(harr.size()-1) remove the last element
adjustheap(0)
return temp
}
else
return null
}
}
32
• GQueue, QueueForBFS and QueueForDFS Classes
The abstract class GQueue extends OpenList to implement method extract fron that
is identical for both its derived classes QueueForBFS and QueueForDFS. The
difference between those two classes is that the first implements method
enqueue(Hub) to insert elements at the tail of the queue (harr), while the second
inserts elements at the head of the queue.
33
5.5 Producer-Consumer paradigm
In spite of all the trials we have made to reduce the complexity of the algorithm, the fact
that the number of solutions can be exponential cannot be avoided. Consequently if the
implementation of the algorithm was designed to produce all the possible solutions as a
bulk before responding back to the user, the user may have to wait for a time of exponential
order (the number of solutions). So our implementation had to be dynamic in the sense that
it can respond back to the user with a subset of the solutions of convenient size, while still
producing the rest of solutions.
One of the well known approaches to solve such problems is the Producer-Consumer
Paradigm, in which the producer and the consumer share a buffer, the producer can
produce items in the buffer whenever there are new items and the latter permits insertion,
and the consumer can consume the next available item from the buffer whenever it exists.
The buffer has two pointers one of them locates the next item to be consumed, we call this
pointer "readFrom", the other pointer locates the next nearest empty cell where the next
new item can be placed. In our implementation we keep all produced solutions in the buffer
(i.e. no solutions are overwritten by newer ones) to enable the consumer to re-consume
them upon the users request. Obviously the second pointer's value is always equal to the
buffer's size, so there is no need to use a separate pointer. The buffer permits insertion
whenever the difference between the "readFrom" pointer and the insertion location (the
end pointer of the buffer) is less than or equal to a specific size (we call it "window"). The
buffer permits consumption whenever the difference between the "readFrom" and the
insertion location is greater than zero.
Solution1
Solution2
Solution3
Solution4 ÅreadFrom
Solution5
Solution6
Figure 5.6: The buffer's view after consuming the first "window" using window size of three
34
Method finish of class Buffer is called by the producer to indicate that there will be no
more solutions to insert, and notify the consumer thread so that it won't wait for a complete
batch size to be ready in the buffer, instead it will consume whatever is there and return.
As mentioned before in section … the GAlgorithm loops while there is a possibility of
finding more solutions, and on finding a new solution it inserts it in the buffer or the
producer waits if the buffer does not permit insertion at this moment.
The Consumer's methods getFirstBatch and getNextBatch are called by the JSP pages. The
getFirstBatch is invoked when the user issues a new query and presses "Submit" button.
The results window will be displayed in the output page that contains a "Next" button to
invoke the getNextBatch method when pressed, and so on.
35
UML and Pseudo Code of Producer-Consumer implementation
run()
{
Instantiate an instance of the GAlgorithm Class
Call method respondToNewRequest (fields)
Call the buffer.finish() method that assigns false to the producerNotFinished
flag.
}
}
36
Figure 5.8: Consumer Class UML
Window getNextBatch(batchSize) {
For (i=0 to batchSize){
Object oÅbuf.consumeNext()
if (o is not null) add o Object to Window
}
return Window
}
37
Figure 5.9: Buff Class UML
Buff {
constructor(batch size) {
producerNotFinishedÅtrue
this.batchSizeÅbatchSize
bufÅnew ArrayList()
}
38
add the object o in the buf
notify the consumer thread
}
finish(){
producerNotFinishedÅfalse
notify the consumer thread that there will be no more solutions
}
}
39
5.6 Auxiliary Classes
The Hub class is used to represent the node in the heap or the graph; it represents a certain
trip (i.e. a train at a certain time) in a certain station. It has the following instance variables
(fields):
number: an identifier of the hub, it is composed of the trip number
concatenated with the station number.
value: will hold the time cost (i.e. the desirability of the hub)
arrivalTime: the time at which the station is reached by the train
hubDate: the date at which the station is reached by the train
parent1: a pointer to the first hub that generated this hub
parent2: a pointer to the second hub that generates this hub
move: a flag to indicate whether this hub is last reached from an
exchange of a train or a move.
exchanges: a counter that holds the minimum number of exchanges
needed to reach this hub.
goLeft: a flag to indicate whether the parent1 link should be used
(again) in traversing the graph of hubs or not.
goRight: a flag to indicate whether the parent2 link should be used (again) in traversing
the graph of hubs or not.
40
UML and Pseudo Code of Hub Class
class Hub {
number
intSize
value
arrivalTime
hubDate
routeCode
parent1
parent2
move
exchanges
goLeft
goRight
constructor(trip,station,hubDate,action,arrivalTime,routeCode)
{
goLeftÅfalse
goRightÅfalse
intSize Å 65536
this.routeCodeÅrouteCode
41
moveÅaction
numberÅstation+trip*intSize
this.hubDateÅhubDate
this.arrivalTimeÅarrivalTime
}
long trip()
{
return number/intSize
}
long station()
{
return number % intSize
}
setValue(value)
{
this.valueÅvalue
}
addParent(Hub parent)
{
if (parent1 is null)
{
parent1Åparent
goLeft=true;
if (parent.station()is not this.station())
{
exchangesÅparent.exchanges
}
else
{
if (parent hub was not reached from a move)
{
exchangesÅparent.exchanges
}
else
{
exchangesÅparent.exchanges+1
}
}
}
else
{
if (parent1.exchanges is greater than parent.exchanges)
{
42
if (this.station() is the same as parent.station())
{
if (parent was not reached from a move)
{
exchangesÅparent.exchanges
}
else i.e. parent.move eauals 1
{
exchangesÅparent.exchanges+1
}
moveÅ0
}
else
{
exchangesÅparent.exchanges
moveÅ1
}
}
else if (parent1.exchanges is less than parent.exchanges)
{
if (this.station() is the same as parent1.station())
{
if (parent1 was not reached from a move)
{
exchangesÅparent1.exchanges
}
else
{
exchangesÅparent1.exchanges+1
}
moveÅ0
}
else
{
exchangesÅparent1.exchanges
moveÅ1
}
}
else i.e. parent1.exchanges equals parent.exchanges
{
exchangesÅparent.exchanges
moveÅ1
}
parent2Å parent
goRightÅtrue
}
}
}
43
5.6.2 DBconnector Class
Since we use the four tiers architecture, we need an interface between the application tier
and the underlying data base tier. This interface is represented in the DBconnector class,
which encapsulates all the interactions between the database and application tiers. The
modularity of the design facilitates changing the database entirely, and only the
DBconnector class will need to be modified. The Details of the underlying database design
and implementation is discussed in chapter 7.
44
DBconnector {
constructor(request parameters, algType,connectionString) {
conn Å create a new connection(connectionString)
maxVelocityÅgetMaxTrainVelocity()
setParameters(request parameters, algType)
}
The constructor creates a connection to the database, gets the maximum train speed
and keeps it in the instance parameter maxVelocity, then calls setParameters with the
request parameters and the algorithm type as parameters.
ArrayList getFares(trip,distance)
{
rsÅexecuteQuery
"select TRN_TYPE_CODE_FK from TRIP where trip_code_fk = trip"
trainTypeÅ get train type from rs
rsÅexecuteQuery
"select CLASS_CODE, TKT_FARE from FARE where DSTNC = distance
and TRN_TYPE_CODE_FK = trainType and FCC_CODE = FCC and
TKT_TYPE_CODE = TKTType"
The getFares method is used to get the fares of all accepted classes (by the user) for a
specific trip and distance.
long getStationCode(stationStr){
rs ÅexecuteQuery
" select STN_CODE from station where name = stationStr"
return(station code from rs)
}
The getStationCode method takes a string representing a station's name and returns
the corresponding station code (number).
String getStationName(stationCode){
RsÅexecuteQuery
" select name from station where STN_CODE = stationCode"
return(name from rs)
}
The getStationName method takes a long representing a station's code and returns the
corresponding station name (string).
45
Time getTimeCost(h)
{
switch(algType){
case 0:
case 1:{ blind technique BFS or DFS
tÅ(0, 0, 0)
break
}
case 2:{Uniform Cost : g(n)
tÅ(h.arrivalTime+totalSeconds(h.hubDate))
break
}
case 3:{ h(n) for greedy (best first)
tÅgetEstimateToDestination(h)
break
}
case 4:{ A*
tÅ(h.arrivalTime + totalSeconds(h.hubDate)+ (getEstimateToDestination(h)))
break
}
}
return t
}
The getTimeCost method is used to evaluate the desirability of the hub according to
the algType field. In case of blind search like BFS and DFS technique all hubs have
equal desirability, so the returned value is zero. In case of the Uniform Cost search
strategy the returned value is the time cost to reach this hub from the source hub (i.e.
the arrival time). In case of greedy search (Best First) the returned value is the
estimate time cost of reaching the goal from this hub. In case of the A* search
strategy the returned value is the sum of the time cost to reach this hub and the
estimate of reaching the goal from it.
Connection connectToDB(string connectionString) {
DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());
conn ÅDriverManager.getConnection
(connectionString)
return conn
}
46
newTrip Å get trip code from rs
offset Å get arrival day offset from rs
arvlTimeÅ get arrival time from rs
routeÅget route code from rs
if (class is accepted by user)
{
if (checkDate(newTrip,offset,hubDate)==1)
{
Hub tempÅ new
Hub(newTrip,station,hubDate,0,arvlTime ,route)
temp.setValue(getTimeCost(algType,temp))
return temp
}
}
}
}
47
Time getDepartureTimeOfSegment(Hub h)
{
rs ÅexecuteQuery " select DPTR_TM from trip_segment where trip_code_fk
= h.trip() and stn_from_fk = h.station()"
departure = get departure time from rs
return (departure)
}
int checkClasses(long newTrip)
{
rs Å executeQuery " select distinct (CLASS_CODE) from TRIP_VEHICLES
where trip_code_fk = newTrip"
foreach record in the result set{
class = get class code from rs
if (class is accepted by user){
return 1
}
}
return 0
}
The method checkClasses returns 1, if there exists at least one class accepted by the
user in the given trip.
int getTrueDistance(from, to, route)
{
rs ÅexecuteQuery
"select DSTNC from SEGMENT where STN_FROM = from and STN_TO
= to and VIA_ROUTE =route"
return ( distance from rs)
}
int checkDate(newTrip, offset, hubDate)
{
dtÅnew GregorianCalendar(hubDate.year,hubDate.month,hubDateday_of_month-
offset)
rsÅexecuteQuery " select trip_code_fk from trip_instance where trip_code_fk =
newTrip and trip_date = dt"
int found Å get trip code from rs
if (found==newTrip)return 1
else return 0
}
The method checkDate is used to check if there is an instance of the given trip in the
given date.
Hub getNextTrain(Hub h) {
tempÅgetNextTrainFromSameStation(h.trip(),h.station(),h.hubDate,hubTime);
if (temp equals null)tempÅgetNextTrainFromSameStation(h.trip()
h.station(),new GregorianCalendar(h.hubDate.year,h.hubDate.month,
h.hubDate.getday_of_month+1)
,new Time(0))
return temp
}
48
The getNextTrain method is used to get the next train from the station of the last hub.
Firstly it tries to find a departing train in the given date, if none exists, it tries to find
the first departing train in the next day. Waiting for more than two days is not
permitted.
Hub getFirstHub() {
tempÅgetNextTrainFromSameStation(-1,from,ODate,OTime);
if (temp equals null)temp=getNextTrainFromSameStation(-1,from,new
GregorianCalendar(ODate.year,ODate.month,ODate.day_of_month+1),new
Time(0))
return temp
}
The getFirstHub method tries to get the first train to depart from the source station
satisfying the request parameters. It firstly tries to find a train in the day submitted by
the user after the given time, if it fails, it gets the first departing train in the next day.
Waiting for more than two days is not permitted.
Coordinates getCoordinates(long station){
rs = stmt.executeQuery
" select X_DSTNC, Y_DSTNC from station where STN_code = station"
x Å get X coordinate from rs
y Å get Y coordinate from rs
return( new Coordinates(x,y))
}
long getMaxTrainVelocity(){
rs ÅexecuteQuery
" select max(velocity) from M_TRN_TYPE "
return( velocity from rs)
}
Time getEstimateToDestination(h) {
Coordinates sourceÅgetCoordinates (h.station())
distanceÅsource.distanceTo(destinationCoord)
return Time(distance/maxVelocity)
}
The getEstimateToDestination method estimates the time cost to reach the destination
station from the given hub's station. The time is evaluated by dividing the straight line
distance by the maximum train velocity. (i.e. the maximum velocity of the fastest
train), and that is to ensure that the estimate is admissible (i.e. less than or equal to the
true time needed). The admissibility constraint is required to ensure the optimality of
the A* search strategy.
setParameters(request parameters) {
Stores the request parameters in instance parameters to ease their reusability.
}
}
49
5.6.3 DetailedJourney
The class DetailedJourney is used to store and manipulate the solution that will be
presented to the user. The instance variables (fields) of the class are:
stations: an ArrayList that holds the names of the stations where the trip stops (hubs)
times: an ArrayList that holds the arrival and departure time of each hub, except the first
and the last who has only a departure and an arrival time respectively, also the hubs of a
train exchange have only one time attribute.
dates: an ArrayList that holds the arrival and departure date of each hub, except the first
and the last who has only a departure and an arrival time respectively.
numberOfTrains: an integer that holds the total number of trains involved in the trip
trains: an ArrayList of the trains involved in the trip
costs: an ArrayList that holds the costs of each acceptable (by the user) class, for each train
envolved in the trip.
waitingPeriods: an ArrayList that holds the waiting periods at every train exchange in the
trip.
50
UML and Pseudo Code of DetailedJourney Class
class DetailedJourney {
stations
times
dates
NumberOfTrains
trains
costs
waitingPeriods
51
if (currentHub.station() is the same as journey.get(i+1).station()) a train exchange
{
journey.get(i+1).trip( ) to trains (i.e. add the previous train)
add getFares(journey.get(i+1).trip(), tripDistance) to costs
add(currentHub.arrivalTime-journey.get(i+1).arrivalTime) to waiting times
tripDistanceÅ0
}
else (i.e. still in the same train)
{
if ( i is not 0) (because the hub at journey[0] has no following hub)
{
if (journey.get(i).station() is not the same as journey.get(i-1).station())
{
add the departure time of hub at journey[i] to times
if (the last added time in times is after its proceeding ){
that means it is in the same day
add the last date in dates to dates again
}
else
{
it must be in the new day
add the last date in dates incremented by one to dates again
}
}
}
add the true distance between the station of the current hub (i) and the previous hub
(i+1) via
their route to the counter tripDistance.
}
add the last train to trains
add getFares(currentHub.trip(),tripDistance) to costs
}
}
52
5.6.4 Coordinates Class
This class contains the coordinates of a station's location to be used in estimating the
straight line distance between two stations, which is used in the estimation of the time cost.
53
Chapter 6
Reservation
In this chapter we are going to define the reservation problem, discuss the available
strategies for solving this problem, present the selected strategy and show the design and
the implementation of this strategy.
After the passenger chooses the most suitable trip for him, a seat is to be reserved on that
trip. The passenger should choose the required class, ticket type and the type of discount.
The passenger could choose the required seat or it may not be important for him to
choose the seat. If the passenger has not chosen a particular seat, the system can choose
the seat for the passenger. The system could increase the utilization of each seat in the
train, by choosing the seat to reserve with some heuristics, not randomly.
In a long trip, that passes through many station, not all the passengers want to go from the
start station of the trip to the last one. Some passengers reserve seats for a certain
segment in the middle of the trip, leaving these seats free in the other gabs. The goal is to
get the best use of these gabs, and decreasing them as much as possible.
54
6.2 Available Techniques
Here are some heuristics in order to choose the seat to be reserved:
• First-fit strategy: the system reserves the first available seat that is free in the
stations that the passenger wants to pass through. This strategy experiences a less
overhead.
• Best-fit strategy: the system reserves the seat which fits more tightly and leaves the
smallest gab. It seems the most intuitive strategy. It incurs the overhead of searching
all the seats for the best-fit seat. However, this overhead is theoretically of the same
order compared to the First-fit strategy, they are both linear in the number of seats,
i.e. O(n); where n is the number of seats.
• Worst-fit strategy: the system reserves the seat which fits worst. The intuitive appeal
is simple; after reserving the seat, the seat will be free for a relatively large number of
stations, enables the system to reserve this seat for another passenger that wishes to
continue with this trip, after the first passenger leaves the seat. This strategy has the
same overhead of the Best-fit strategy, but practically in the case of seat reservation,
it does not increase the utilization as the Best-fit strategy, as the number of stations
that the trip passes through is already not very big, so depending on leaving large
gabs that can be reserved to another passenger to increase the utilization is not a
strong reason for choosing the Worst-fit strategy.
55
6.3 UML Class Diagram
Here we are going to show the design of the reservation facility, using the Best-Fit
strategy. The design is shown by the UML Class Diagram shown below.
Reservation
Seat
trip_code: int
date: Date trip_code: int
1 1 seat_num: int
stn_from: int
stn_to: int best seat vehicle_num: int
class_code: int segments: ArrayList
fcc: int gab: double
tktType: int factor: const double
1 * stmt: Statement
dstnc: double
trnType: int all partially
fare: double reserved seats + Seat()
refundFare: double - initializeSegments()
bestSeat: Seat + addSegment (stn_from:int,
stmt: Statement stn_to:int): int
conn: Connection + checkGab (stn_from:int,
stn_to:int):double
Reservation() - index (station: int)
getBestSeat():Seat
insertTicket() 1
getAllStations(): ArrayList segments
getAllTrainClass(): ArrayList
getAllTktTypes(): ArrayList *
reserved Seat_Segment
station: int
Ticket reserved: boolean
trip_code: int
date: Date
stn_from: int Seat_Segment(station: int)
stn_to: int
class_code: int
seat_num
vehicle num
Ticket()
56
6.4 Pseudo Code
The implementation reservation facility, using the Best-Fit strategy, is shown by the
pseudo code of the main methods:
Class: reservation
Method: getBestSeat()
• Given trip code, departure and arrival station calculate the distance, by
adding the distance of each segment in between the departure and
arrival station.
• Given trip code bring the train type from the data base.
• Given the FCC (type of discount), ticket type, required class calculated
distance and train type bring the fare and refund fare from the data
base.
• For each reserved ticket in this trip and required class
For each partially reserved seat calculate the gab caused be the
ticket required to be reserved.
Return the seat with the min gab, if there exist one.
Else return a totally free seat, if There exist one.
Else return null.
Class: Seat
Method: checkGab(stn_from:int,stn_to:int):double
• The array ‘segments’ has all the stations that this seat passes through,
and a boolean for each station represents whether the station is
reserved at this station or not.
• Call addSegment(stn_from,stn_to), which will make the boolean of the
stations between ‘stn_from’ and ‘stn_to’ true. It returns -1 if one of the
Booleans is already true, which represents that it is invalid to reserve
this seat.
• After adding this segment to the seat, calculate the formed gab before
and after the segment.
• If there is two gabs (before and after the required segment), multiply
the summation of the two gabs by a factor, and that to prefer the gabs
at one side.
57
Chapter 7
This chapter shows the phases of developing the underlying database system for our
project. It begins with the functional analysis, for the three main functions in the system:
query facility, reservation facility and the administration facility. The chapter then
presents the data modeling, by analyzing the data needed to be stored and maintained like
the data of the physical railways network which consists of the stations and the railway
connections between them, the data of the trips and their schedules and the data required
for the reservation (e.g. the fares of each kind of ticket and the already reserved tickets).
After the data modeling, it is the time for the logical design, which is the transforming of
the conceptual data model into relational data model, and specifying the data integrity
constraints. Finally, it comes the phase of physical design and tuning.
The database recovery methods, provided by the used database management system
(Oracle 8i), is shown in the database recovery section.
The last section of this chapter discusses the database administration tool that our system
provides.
The main three functions in the system: query facility, reservation facility and the
administration facility. The query facility is responsible for running the routing algorithm
to find the feasible journeys, which satisfy user’s constraints, ranked by their arrival time.
The reservation facility is responsible for reserving a seat for the user that increases the
utilization of each seat in a train. Finally, the administration facility is the tool that we
provide for the employees to facilitate the process of data entry.
We are going to use the Data Flow Diagram (DFD) in analyzing these functions. The
purpose of the DFD is to show where the data comes from, where the data goes to when
it leaves the system, where the data is stored, what processes transform it, and the
interactions between data stores and processes.
58
DFD of the query facility
1 4 Trip Instance
Trip code and date
User Departure and arrival stations
User constraints Routing
5 Train types
Algorithm Velocity
Trip code
Sets of hubs train type
Feasible 6 Trip
journeys
X_Coordinate
Y_Coordinate
2
Name
Construct 7 Station
Detailed Ticket fares
Journeys for each 8 Fare
class
In order to find the set of feasible journeys, the routing algorithm reads the departure
and arrival stations with all the constraints of the user and then connects to the
database to read all the data related to the trip schedules and the user constraints.
Finally, the ‘Construct Detailed Journeys’ process prepares a detailed description of
the algorithm solution and send it to the user.
59
DFD of the reservation facility
1 4 Trip Instance
Selected trip Trip code and date
User Ticket type and class
Reservation Ticket fare
Reserved seat and And refund 5 Fare
vehicle number, fare
fare and refund
fare of the ticket
7 Reserved tickets
The reservation facility requires the knowing of the trip selected by the user and the
already reserved tickets in order to decide the suitable seat to reserve. During this
process all the vehicles of this trip with the requested class are needed to choose from.
Also the segments that this trip passes through this trip is needed to check whether the
seats are available or not during these segments. Finally the fare and the refund fare is
needed to be shown to the user.
60
DFD of adding a new Trip:
2 Trip
2 3 Train types
Trip_code, TRN_type, All train types
Employee weekdays, STN_departure,
operating and expiring date
Insert new trip
4 Station
All stations
Trip_code, List of
all vehicles with
their classes and
User name,
number of seats
password
5 Train classes
3
All classes
Insert trip
List of all
vehicles trip vehicles 6 Trip_Vehicles
1
Trip_code, List of
all segments with
Verify/connect
their arrival and
departure time
User names,
passwords
4 7 Segments
All segments
The main page in the administration tool is the page of inserting a new trip, as it
facilitates the steps of adding a new trip to the database, which implies dealing with
many tables. In order to add a new trip, first the employee should be authorized to add
a trip. Authentication is verified by username and password, which represent his role
in the data base. The employee gives the TRN_type, weekdays, STN_departure,
operating and expiring date of the trip, the system suggests a trip code equals to the
Max(trip_code) +1, which the employee can accept or change it. This trip is inserted
in the Trip table. Then, the employee should supply the list of vehicles and the class
of each one. The system supplies the employee by the names of all classes in order to
select one of them for each vehicle. Finally, the list of segments that represent the
rails that the trip passes through is inserted; during this operation the system supplies
the employee by all the segments in the network to choose the list from them.
61
7.2 Data Modeling:
Data modeling is a very important task in building a database system since it has an
impact on efficient database design. We applied the Entity Relationship approach in
defining the conceptual abstract view of the database system. The database system of any
railway network is divided into three main subjects the physical network, the trip
schedules and the reservation information. We are going to use the Entity Relationship
Diagram (ERD) in representing these subjects.
Physical Network
X_ Y_
Coordinat Coordinat
Stn_code
name
Station
From To
name
M_Route Segment
Via
Route_code Dstnc
The physical network consists of all the stations each with its horizontal and vertical
coordinates. Also there are the segments (rails) from a certain station to another with the
length of the rail between them. The via-route between in the segment is used to
differentiate between the rails that connect the same stations.
62
ERD of the Trip Schedules:
M_TRN_
Type
Schedule
Trip_operating
_date
departure Trip_expiring
Station Trip _date
Trip_code
weekdays
Vehicle
_code
A schedule of the railway system is represented by the trips that run on the segments. The
trip is characterized by the train type, the departure station, the days of the week the trip
runs at, the trip starting date (operating date) and the expiring date. For each trip there is a
list of segments together with its arrival and departure times. Also for each trip there is a
list of train vehicles that go for this trip each with its class and number of seats.
63
64
ERD of the Reservation:
M_TRN_type
Reservation
Dstnc TKT_fare
Class_code
TKT_Rfnd_
Fare fare
FCC_code
TKT_type_
code
Trip_date
depature
Station Trip_Instance Trip
Ticket
Arrival
The fare of a ticket is determined by the requested class, train type, ticket type, discount
type (FCC) and the distance of the segment. For each fare there is a refund fare that will
be returned to the customer if he cancelled the reservation.
The trip instance is instantiated from the trip a week before the date of the trip to enable
passengers to reserve tickets on that trip a week before its date. Tickets cannot be
reserved unless the instance is instantiated.
65
7.3 The Entity Type Specifications (ETS)
The table Station contains the station code, station name beside the horizontal and
vertical coordinates of the station measured from some origin, these distances is used to
derive the physical straight line distance between any two stations. The x and y distance
can be positive or negative kilometers. The straight line distance can be used with the
max train velocity to estimate the real distance between to stations, which help us
estimating the real cost or time.
The master table of routes contains all the possible routes between stations. As any two
stations may have more than one possible route to connect them, each two stations with
one of their connecting routes are called a segment.
The Segment table contains all the segments with their real distances not the straight line
distance. Each train trip consists of many segments.
66
Project: EgyTrains Subject: Master Information Page 1/1
The master information about train types contains the train type code with its name and
velocity. The velocity helps in estimating the real distances when there is no direct
railway connection between two stations as described before. Examples of train types are:
Spanish, turbine or French.
Trip is the master information about all valid trips. It contains train type, the departure
station, days of the week at which the trip runs, the trip starting date (operating date) and
the expiring date. Any expired trip is deleted from the Trip table and it is archived.
Trip_Segment contains all the segments of a certain trip, the departure station, the
departure time, the arrival station, the arrival time, and day offset between this segment
and the first segment in this trip.
67
Project: EgyTrains Subject: schedule Page 1/1
The Trip_vehicles table contains a record for each vehicle in a certain trip, with its class
code and the number of seats in this vehicle.
Each day, an instance of each trip in the next week should be made, and given a date.
One cannot reserve a ticket on a trip unless this instance is made. Old instances should be
deleted or archived.
Each instance has a system maintained field containing the number of available (totally
free) seats till now. The number of available seats represents the number of totally free
seats. If it reaches zero it does not mean that no one can reserve any ticket, as a passenger
can reserve a ticket for a part of the trip and sit in a seat that is not totally free provided
that it is free through the required part of the trip. However it is a good estimate that
shows whether it will be easy to find a seat or not in this trip and it can be used
statistically to show the frequency of demand on the concerned trip.
68
Project: EgyTrains Subject: Reservation Page 1/1
Fare is the table of all fares. Each fare is determined by the distance, train type, reserved
class, FCC (whether there is a discount or not, and the type of this discount), ticket type
(single ticket or part of a return ticket).With each fare there is also the refund fare which
is returned to the customer if he cancelled the reservation.
Ticket is the table of already reserved tickets. The information with each ticket is the trip
instance (determined by the trip code and date), departure and arrival station, reserved
class, ticket type, discount, distance (to be able to determine the fare from Fare table),
vehicle number and seat number.
69
Project: EgyTrains Subject: Master Information Page 1/1
The FCC is the kind of the discount on the ticket (full price, half price or militant).
Train classes are first, second or third class. This table is used when printing these names.
If in the future, other classes are added, the validation rules of the class_code in any other
table should be changed.
70
7.4 Logical Database Design
Logical database design is transforming the conceptual data model into relational data
model. The obtained relations should be first normalized to the third normal form.
Normalization is the decomposition of complex data structures according to a set of
dependency rules, designed to give simpler and more stable data structures.
There are many normalization forms. The first normal form is that the relation does not
have a multi valued attribute. A relation is in the second normal form if it is in the first
normal form and non prime attributes are functionally dependent on the entire primary
key and not on part of the key. A relation is in the third normal form if it is in the second
normal form and each nonprime attribute s independent of any other nonprime attribute.
Here are each relation in the database and its functional dependency diagram that shows
whether the relation is in the third normal form or not.
STATION
NAME
STN_CODE X_DSTNC
Y_DSTNC
M_ROUTE
ROUTE_CODE NAME
SEGMENT
STN_FROM
STN_TO DSTNC
VIA_ROUTE
71
M_FCC
TKT_TYPE_ NAME
CODE
CLASS_ NAME
CODE
M_TRN_TYPE
NAME
TRN_TYPE_
CODE
VELOCITY
72
TRIP TRN_TYPE_CODE_FK
STN _DEPARTURE_FK
WEEKDAYS
TRIP_CODE
TRIP_OPERATING_DATE
TRIP_EXPIRING_DATE
Note that Weekdays is not a multi valued attribute, it is an integer of seven digits, each
for one day of the week. If the trip runs on that day then the digit is one else the digit is
zero.
TRIP_ SEGMENT
TRIP_CODE_FK
ARVL_TM
STN_FROM_FK
DPTR_TM
STN_TO_FK
DAY_OFFSET
VIA_ROUTE_FK
TRIP_VEHICLES
CLASS_CODE
TRIP_CODE_FK
VEHICLE_CODE
SEATS
73
TRIP_INSTANCE
TRIP_CODE_FK
NO_AVAILABLE
_SEATS
TRIP_DATE
FARE
DSTNC
TKT_FARE
TRN_TYPE_
CODE_FK
CLASS_CODE
TKT_RFND_FARE
FCC_CODE
TKT_TYPE_CODE
74
TICKET
TRIP_DATE_FK
STN_FROM_FK
DSTNC
STN_TO_FK
TRIP_CODE_FK TRN_TYPE_CODE_FK
CLASS_CODE
VEHICLE_NUM
SEAT_NUM FCC_CODE
TKT_TYPE_CODE
The Ticket relation is not in the second normal form as there are some non prime
attributes that are functionally dependent on part of the key not the entire primary key. To
normalize the relation ticket,
The attributes DSTNC, TRN_TYPE_CODE_FK and CLASS_CODE must not belong to
the Ticket relation. This normalization will slow down the speed of knowing the fare and
refund fare of any ticket.
After normalization, the steps to get the fare of a ticket are: getting train type from the
table Trip using the Ticket.TRIP_CODE_FK, getting the class from the table
Trip_segment using the Ticket.TRIP_CODE_FK and Ticket.VEHICLE_NUM, getting
all the segments that connect the Ticket.STN_FROM_FK and Ticket.STN_TO_FK,
getting distances of all these segments from the table Segment, adding all these distances
and then getting the fare from the table Fare.
For this reason , we chose to denormalize the Ticket relation. Leaving the Ticket relation
as it is, will enhance the performance of getting the fare of a certain ticket, as the fare will
be brought from the table Fare using Ticket.DSTNC, Ticket.TRN_TYPE_CODE_FK,
Ticket.CLASS_CODE, Ticket.FCC_CODE, Ticket.TKT_TYPE_CODE.
75
Codd’s representation for the relations
STATION (STN_CODE, NAME, X_DSTNC, Y_DSTNC)
76
7.5 Data integrity
There are different types of relational integrity constraints, like the domain constrains, entity
integrity constraint, referential integrity constrains and semantic integrity constrains.
Domain constrains:
Domain constrains specify that within each field, the value of each attribute must be an atomic
value from its domain. Domain constraints are preserved by the DBMS. Here are all the relations
with the domain of each field.
77
Entity integrity constraint:
The entity integrity constraint states that no primary key value can be null. This is because the
primary key value is used to identify individual tuples in a relation. The DBMS is responsible for
preserving this integrity constrains.
Referential integrity constraint is specified between two relations and is used to maintain the
consistency among tuples in the two relations. The referenced attribute should be the primary
key of the referenced relation. Referenced tuples cannot be removed until all referencing tuples
are removed. A referencing tuple cannot be inserted before the referenced tuple. Again, The
DBMS is responsible for maintaining referential integrity constrains.
Domain constrains and referential constrains not sufficient, database triggers should be used, to
allow defining and enforcing semantic integrity constrains. Here comes some of the integrity
rules for our railway system and the pseudo code of the set of triggers that enforce this rule.
Integrity rule:
In each record of the Segment table (STN_FROM, STN_TO, VIA_ROUTE, DSTNC)
Stn_from must not be equal to stn_to.
Trigger:
Before insert on Segment
If new.stn_from = new.stn_to then return error
Integrity rule:
The expiring date in the Trip table should be greater than the date of inserting its record;
otherwise it should be inserted in the archive.
Trigger:
Before insert on Trip
If new.expiring_date < operating_date
return error(“expiring date should be after the operating date”)
If new.expiring_date < current system date
return error(“expiring date has already passed, you can insert this trip on
the archive or check the date”)
78
Integrity rule:
The date in the Trip_Instance table should be greater than the date of inserting its record.
Trigger:
Before insert on Trip_Instance
If new.date < current system date
return error(“the date has already passed”)
Integrity rule:
The list of segments for a certain trip, that are stored in the table Trip_segment should be
a continuous list that does not form a loop, i.e. can not depart from any station twice, nor
arrive at any station twice, nor return to a previous visited station. Also one can not
depart from a station to arrive to it.
Trigger:
Before insert on Trip_segment
If it is not first segment to insert in this trip, there must exist one and only one
segment in this trip in which stn_to = new.stn_from
else
Return error(“the list of trip_segments is not continuous”)
Trigger:
Before insert on Trip_segment
For each segment on the same trip
segment.stn_from_fk must not equal to new.stn_from_fk
segment.stn_to_fk must not equal to new.stn_to_fk
segment.stn_from_fk must not equal to new.stn_to_fk
new.stn_from_fk must not equal to new.stn_to_fk
else return error(“There exist a loop, check this segment with previous segments”)
Integrity rule:
The list of segments for a certain trip, that are stored in the table Trip_segment should be
continuous in time, i.e. if the time of departure is before the time of arrival, then the day
offset must be more than zero.
Trigger:
Before insert on Trip_segment
If it’s the first segment in this trip then new.day_offset = 0
Else
Check the departure time of the previous segment
If new.DPTR_TM is on the same day of previous_segment.DPTR_TM
new.day_offset = previous_segment.day_offset
Else
new.day_offset = previous_segment.day_offset + 1
79
Integrity rule:
Each part (set of trip_segments) of the trip should have a set of fares assigned to it in the
Fare table, one fare record for each combination of classes, ticket types, and discount
types.
Trigger:
Before insert on Trip_segment
For each class of this Trip in the table Trip_vehicles
For each type of discount in the table M_FCC
For each ticket type in the table M_Tkt_type
Check that there is a fare in the table Fare for this class,
discount, ticket type and the distance between the
new.STN_from_fk and new.STN_to_fk
Trigger:
Before insert on Trip_ vehicles
For each type of discount in the table M_FCC
For each ticket type in the table M_Tkt_type
For each distance of each combination of segments of that trip
Check that there is a fare in the table Fare for this discount,
ticket type, distance and new.Class_code
Integrity rule:
The number of available seats in the Trip_Instance table is system maintained. It should
be decreased when a certain seat is reserved.
Trigger:
Before insert on Trip_Instance
Select sum(seats) from Trip_vehicles
where Trip_vehicles.trip_code_fk = new.trip_code_fk
new. NO_AVAILABLE_SEATS Å this sum
Trigger:
After insert on Ticket
Select distinct vehicle_num, seat_num from the table Ticket
get the count of these records
update the number of available seats in the Trip_Instance with this count
80
Integrity rule:
The distance of the rail between the departure and arrival stations in the Ticket table
should be equal to the sum of distances between the segments that connects the departure
station with the arrival station.
Trigger:
Before insert on Ticket
dist = get_dist (new.TRIP_CODE_FK, new.STN_FROM_FK, new.STN_TO_FK)
if dist = 0 then there is no segments connect these stations
if the employee entered a distance it must equal dist
if the employee did not enter a distance, insert the record with the calculated dist
Function:
get_dist (trip_code: number, stn_from: number, stn_to: number)
for each segment in the table Trip_segment,
that connect stn_from and stn_to with the Trip_code = trip_code, do
bring the distance of this segment from the table Segment.
Add this segment to dist
Return dist
Integrity rule:
No one can reserve a ticket from a station to the same station.
Trigger:
Before insert on Ticket
If new.stn_from = new.stn_to then return error
Integrity rule:
No one can reserve a ticket on a certain seat passes through certain stations, if this seat is
already reserved within one of these stations.
Trigger:
Before insert on Ticket
For each Ticket on the required seat
If there is intersection between the stations passed by this ticket and the
stations passed by the new ticket return error
81
7.6 Physical Design
The goal of the physical design is to guarantee a good performance beside the appropriate
structuring of data. It is not possible to make meaningful physical design decisions until
we know the queries, transactions and applications that are expected to run on the
database.
The routing and the reservation algorithm represent the applications that run on the
database. First we are going to analyze the set of queries invoked by the routing
algorithm and then the queries that are invoked by the reservation algorithm.
Here are the queries invoked by the routing algorithm. All these queries are in the class
DBconnector. This class deals with the database, in order to search for possible trips that
the passenger will choose from. Each query is listed with its frequency of invocation per
one query.
Here are the queries in the class that deal with the database, in order to reserve a ticket.
All these queries are in the class Reservation. Each method is listed with its frequency of
invocation per reserving a single ticket.
82
Each query will imply certain decisions, like the attribute on which indexes should be
defined and the type of this index. Whether this decision should be made or not, it
depends on the frequency of this query and the frequency of updating this attribute.
Final decisions:
The following table shows the final decisions that are implemented; these decisions are
chosen due to the analyzed queries and their frequencies.
Oracle DBMS manages the concurrency control for our system. Oracle locking is
performed automatically and requires no user action. Implicit locking occurs for SQL
statements as necessary, depending on the action requested. Oracle's lock manager
maintains several different types of row locks, depending on what type of operation
established the lock. In general, there are two types of locks: exclusive locks and share
locks. Only one exclusive lock can be obtained on a resource (such as a row or a table);
however, many share locks can be obtained on a single resource.
In our routing algorithm all the transactions are read only transactions, as the routing
algorithm just reads the trip schedules to decide the feasible journeys.
The concurrency problem usually come form write after read transactions, which is the
case in the reservation feature in our system. The system reads the current reserved
tickets and decides the suitable seat to reserve and then insert this ticket in the new
reserved ticket in the Ticket table. However, the problem gets more important when the
online payment, which we left as a future work, is developed.
83
7.8 Data Recovery
Data loss can occur for various reasons. Here are some of the most common types of
failures that can lead to data loss.
A user or application error is a user mistake that results in the loss of data. For example,
a user can accidentally delete data from a payroll table. Such user errors can require the
database or object to be recovered to a point in time before the error occurred.
A media failure is a physical problem that arises when Oracle tries to write or read a file
that is required to operate the database. A common example is a disk head crash that
causes the loss of all data on a disk drive. Disk failure can affect a variety of files,
including datafiles, redo log files, and control files. Because the database instance cannot
continue to function properly, it cannot write the data in the database buffers of the SGA
to the datafiles.
Oracle provides users a choice of several basic methods for recovery handling. The
methods include:
84
7.9 Administration Tool:
Data entry process may be a tiring and exhaustive process unless it is well organized. In
order to integrate our system we provided a database administration tool that is user
friendly and well documented to facilitate the entry process.
In the user guide appendix, we provide the necessary information for the employees to
use the administration tool in inserting the data, screen shots of the administration pages
and we suggest a schema for the data entry process.
We developed this tool using JSP technology. We choose the JSP technology as it made
potable by having their contents translated into Java servlet by the application server, and
it is a server-side technology. The architecture of a JSP/servlet-enabled Web site is often
referred to as thin-client because most of the logic is executed on the server.
The administration tool consists of many pages. An administrator should first login in
with a user name and password in the home page. Each page concerns with a certain
table, like: the station, route, segment, train type, ticket fare or trip instance. Inserting a
new trip schedule requires the insertion of data in three tables: Trip, Trip_segment and
Trip_Vehicles. The Data Flow Diagram (DFD) of inserting a new trip schedule, which
concerns with inserting trip general information, the list of segments composing this trip
and the list of vehicles and their classes for this trip is shown in the first section of this
chapter.
85
Chapter 8
We also designed and implemented a database system for railway networks. In order to
facilitate the deployment of the system for servicing the Egyptian railway system, we
contacted the National Organization for Egyptian Railways to learn about the existing
railway system, and to know the types of data they need and the policy of calculating the
ticket fares. We also got the data of the physical network and the trip schedules as a hard
copy, but we failed to get it as a soft copy. Anyway we implemented database
administration tools that facilitate the data entry process.
During designing and implementation, we got a learning experience in many fields, like:
modeling multi-constrained graphs, routing algorithms, solving optimization problems,
database modeling and Web developing.
Online payment
The reservation phase composed of two stages, choosing a free seat that increases the
utilization, which we implemented, and the online payment before the reservation of that
seat is committed, which we left as a future work.
86
After paying the fare of the reserved ticket, the system responses with a serial number,
then we suggest one of these scenarios:
• Providing each station with a device that reads this serial number from the
passenger and print the ticket.
• Providing each ticketing man with a PDA (personal digital assistant) that
validates the serial number.
• The passenger gives the serial number to a desk clerk, to take the ticket associated
to that serial number.
The Egyptian Railways serves more than 1.4 million passengers daily, and runs about
1260 trains daily, and it is growing continuously by adding new lines, that is why it is
difficult for us to insert and maintain the whole real data.
Although we implemented database administration tools that facilitate the data entry
process, we did not actually insert the real data. In the user guide we provide the
necessary information for the employees to use the administration tool in inserting the
data, and we suggest a schema for the data entry process.
87
References
1. Luger and Stubblefield, "Artificial Intelligence, Structures and strategies for
complex problem solving". Addison Wesley Longman, 1998.
88
Appendix A
Routing and Reservation Case Study
A.1 Routing Case Study
A B C D F E
A1 B1 C1 F1 E1
5:00 6:00 7:00 8:00 12:00
6:10 7:10 8:10
A2 C2
5:30 6:30
A3 D3 E3
2:00 8:00 9:00
8:10
F4 E4
9:00 10:00
D5 E5
9:00 10:00
The request is : finding all routes from station A to station E. after 1:00 o'clock
We have 5 search techniques. We will illustrate how each of which behave under the
request.
A-1
A*
A-2
Uniform Cost
serial The The enqueued Notes OpenList
extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 B1 , A2 A2 ,B1
2 A2 C2 , A3 B1, C2, A3
3 B1 C1 No next train A3, C2, C1
4 A3 D3 No next train C2, C1, D3
5 C2 C2 generated C1 but it was C1, D3
not added as it already
exists in OpenList, and
train 2 ended
6 C1 F1 No next train D3, F1
7 D3 E3 , D5 F1, E3, D5
8 F1 E1, F4 D5, F4, E1, E3
9 D5 E5 No next train F4, E5, E3, E1
10 F4 E4 No next train E4, E5, E3, E1
11 E4 Destination E4 is reached E5, E3, E1
12 E5 Destination E5 is reached E3, E1
13 E3 Destination E3 is reached E1
14 E1 Destination E1 is reached
Table A.2 : Uniform Cost trace
A-3
Greedy
serial The The enqueued Notes OpenList
extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 B1 , A2 B1, A2
2 B1 C1 No next train C1, A2
3 C1 F1 No next train F1, A2
4 F1 E1, F4 E1, A2, F4
5 E1 Destination E1 is reached A2, F4
6 F4 E4 No next train E4, A2
7 E4 Destination E4 is reached A2
8 A2 C2, A3 C2, A3
9 C2 C1 C1 is enqueued again in C1, A3
OpenList
10 C1 F1 No next train F1, A3
11 F1 E1, F4 E1, F4, A3
12 E1 Destination E1 is reached F4, A3
again
13 F4 E4 No next train E4, A3
14 E4 Destination E4 is reached A3
again
15 A3 D3 No next train D3
16 D3 E3, D5 E3, D5
17 E3 Destination E3 is reached D5
18 D5 E5 No next train E5
19 E5 Destination E5 is reached
Table A.3 : Greedy trace
A-4
BFS
serial The The enqueued Notes OpenList
extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 B1 , A2 B1, A2
2 B1 C1 No next train A2, C1
3 A2 C2, A3 C1, C2, A3
4 C1 F1 No next train C2, A3, F1
5 C2 C1 C1 is enqueued again in A3, F1, C1
OpenList
6 A3 D3 No next train F1, C1, D3
7 F1 E1, F4 C1, D3, E1,F4
8 C1 F1 No next train D3, E1,F4, F1
9 D3 E3, D5 E1,F4, F1, E3,
D5
10 E1 Destination E1 is reached F4, F1, E3, D5
11 F4 E4 No next train F1, E3, D5, E4
12 F1 E1, F4 E3, D5, E4,
E1, F4
13 E3 Destination E3 is reached D5, E4, E1, F4
14 D5 E5 No next train E4, E1, F4, E5
15 E4 Destination E4 is reached E1, F4, E5
16 E1 Destination E1 is reached F4, E5
again
17 F4 E4 No next train E5, E4
18 E5 Destination E5 is reached E4
again
19 E4 Destination E4 is reached
again
Table A.4 : BFS trace
A-5
DFS
serial The The enqueued Notes OpenList
extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 A2, B1 A2, B1
2 A2 A3, C2 A3, C2, B1
3 A3 D3 D3, C2, B1
4 D3 E3, D5 D5, E3, C2, B1
5 D5 E5 E5, E3, C2, B1
6 E5 Destination E5 is reached E3, C2, B1
7 E3 Destination E3 is reached C2, B1
8 C2 C1 C1, B1
9 C1 F1 F1, B1
10 F1 E1, F4 F4 , E1, B1
11 F4 E4 E4, E1, B1
12 E4 Destination E4 is reached E1, B1
13 E1 Destination E1 is reached B1
14 B1 C1 C1
15 C1 F1 F1
16 F1 E1, F4 F4 , E1
17 F4 E4 E4, E1
18 E4 Destination E4 is reached E1
again
19 E1 Destination E1 is reached
again
Table A.5 : DFS trace
Observations
A-6
A.2 Reservation Case Study
We have a journey from station Alex to station Aswan and it passes Tanta, Cairo and
Kena stations respectively. Alex Æ TantaÆ Cario Æ KenaÆ Aswan
Suppose there are some passengers who want to reserve tickets on this journey:
Let:
• First passenger travel from Alex station to Cairo station
• Second passenger travel from Alex station to Kena station
• Third passenger travel from Tanta station to Kena station
• Forth passenger travel from Kena station to Aswan station
• Fifth passenger travel from Kena station to Aswan station
Then, according to the Best Fit algorithm, which is applied in reservation, the seats’
numbers returned to these passengers from the reservation page, if they request their
tickets in the shown order will be as follows:
For first passenger the seat number will be 1 which is the first seat because all of seats are
available before his request.
For second passenger the seat number will be 2 because first seat is reserved from Alex to
Cairo, which overlaps this passenger’s journey from Alex to Kena.
For third passenger the seat number will be 3 also because seat 1,2 rservation periods
overlap this passenger’s journey.
For forth passenger the seat number will be 2, and here, the effect of the algorithm appear
since, seats 1,2 and 3 are available for the segment from Kena to Aswan but in seat 2 the
available segment is shorter than seat 1 and 3 available segments.
For fifth passenger the seat number will be 3 also the seat 3 has shorter right available
segment than seat 1 and 2.
A-7
Appendix B
User Guide
In our web site we tried to provide a user friendly interface that minimizes the possibility
of error occurrence by using drop lists and applying initial suggestions for the other
inputs.
The user guide is divided into two sections, one for passenger and the other for the
administrator of the database.
• Query page:
B-1
o Date and time
Here, you can enter the preferred departure date and time but this date
and time is the lower limit one, which means that the output journeys’
departure date and time may be after the entered one but not before it.
o Additional information
It is an optional part to put your criteria. You can leave this part with
its default values or change some or all of them.
These criteria are the preferred classes, maximum number of
exchanges which is upper limit for number of changing trains in the
intermediate stations; also you can enter the type of discount and ticket
type. Because the number of solutions may be large and takes time to
be displayed, you can determine the number of solution which will
appear at a time, and then you can see the rest of solutions as we will
explain in the output page.
• Output page
The output page displays the journeys in a table; the meaning of each column is as
follows:
o Journey
This field is detailed information of the journey displayed in this row
which is composed of:
• Station
• Departure and arrival Times for this station (note for the source
station there exists only a departure time and for the destination
station there exists only an arrival time.)
B-2
• Date
• Waiting time which exists at changing trains and it is the
difference between the departure time of second train and the
arrival time of first train, like in the first journey in changing
the trains in Fayoum station.
• Trip number you need it in reserving a ticket for this trip.
o Time
It is the departure and arrival time for the whole journey.
o Duration
The duration of the journey is the difference between arrival and departure
times for whole journey.
o No of trains
It is the total number of trains used in this journey.
o Cost
Cost is displayed for each trip and for each class available in this trip
according to the preferred classes entered in the query page followed by its
seat price.
Beside each class in each trip a link that enables you to reserve a ticket in
this trip with this class as explained in the reservation page.
The number of journeys displayed in output page is according to the preferred one
in which you enter it in the query page. You can see previous or next journeys by
pressing previous or next buttons respectively whenever they are enabled.
• Reservation page:
B-3
For example, if you want to take the first journey displayed in output page in figure B.2,
you may need to reserve two tickets one for trip 3 and the other for trip 4, when you press
the Reserve link beside the class1 for 6.0 L.E to reserve a seat in trip 3, the link will lead
you to the reservation page shown in figure B.3, which will be initialized with all the data
of selected trip.
If these values are suitable for you press the submit button to reserve the ticket if there is
an available seat in this trip a message will appear to tell you the number of the reserved
seat and vehicle number and ticket fare and refund fare as shown in figure B.4
If no seat is available, another message will inform you that there is no available seat.
After login the administration page will be appear as shown in Figure B.5.
You can insert a new station, trip, route, segment, train type, ticket fare or trip instance.
We will discuss next a preferred order of inserting data using the insertion pages.
B-4
Inserting pages
The data entry process should begin with filling the master tables that are not updated
frequently and have no insertion pages, such as: M_FCC, M_TRN_CLASSES, and
M_TKT_TYPE. After that we suggest to use insertion pages in following order:
Station name.
X coordinate to a reference point (e.g. Cairo station).
Y coordinates to a reference point.
B-5
2- Insert all routes
3- Insert segments
B-6
4- Insert all train types
5- Insert fares
B-7
And text boxes for:
• Distance.
• Ticket fare (price).
• Ticket refund fare.
6- Insert trips
The new trip page is divided into three part each part is for inserting data in
different table
B-8
Figure B.12: tripe page, vehicles part
The vehicles part concerns with adding vehicles to the trip, each vehicle has a
class and number of seats to add a vehicles press the button add vehicle, the
vehicles will be stored in the Trip_vehicles table.
Departure station of first segment should be the start station of the trip. And the
arrival station of each segment should be the departure station of the followed
segment. So, you can use the buttons: add segment, add segment before selected,
and remove segment to modify your intermediate segments.
B-9
7- Insert trip instances
The new trip instance page is used daily to insert that trip instances of the trips
that are going to run in the next week, to enable the passengers to reserve tickets a
week before the date of the trip.
B-10