You are on page 1of 24

CHAPTER 1 INTRODUCTION

Mobile Devices are capable of moving or being moving readily from place to place. The advancement of wireless communication techniques and the popularity of mobile devices such as mobile phones, PDA, and GPS-enabled cellular phones, have contributed to a new business model. Mobile users can request services through their mobile devices via Information Service and Application Provider (ISAP) from anywhere at any time. This business model is known as Mobile Commerce (MC) that provides LocationBased Services (LBS) through mobile phones.

1.1 Location Based Services:


It is an information or entertainment service accessible with mobile devices through mobile network. It makes use of geographical position of mobile devices. For eg.,identify location of a person or object, discovering nearest ATM machine ,etc,,,

1.2 Mobile Commerce:


MC is expected to be as popular as e-commerce in the future and it is based on the cellular network composed of several base stations. The communication coverage of each base station is called a cell as a location area. When users move within the mobile network, their locations and service requests are stored in a centralized mobile transaction database. Fig 1.2.An example for Mobile Transaction Sequence (a)MovingSequences (b)ServiceSequences Fig. 1 shows an MC scenario, where a user moves in the mobile network and requests services in the corresponding cell through the mobile devices. Fig. 1a shows a moving sequence of a user, where cells are underlined if services are requested there. Fig. 1b shows the record of service transactions, where the service S1 was requested when this user moved to the location A at time 5. In fact, there exists insightful information in these data, such as movement and transaction behaviors of mobile users. Mining mobile transaction data can provide insights for various applications, such as data pre fetching and service recommendations.

1.3 Mobile Transaction Database:


A mobile transaction database is complicated since a huge amount of mobile transaction logs is produced based on the users mobile behaviors. Data mining is a widely used technique for discovering valuable information in a complex data set and a number of studies have discussed the issue of mobile behavior mining. Mobile behaviors vary among different user clusters or at various time intervals. The prediction of mobile behavior will be more precise if we can find the corresponding mobile patterns in each user cluster and time interval. To provide precise location-based services for users, effective mobile behavior mining systems are required pressingly.

1.4 Clustering:
In general clustering is the collection of objects which are similar between them and are dissimilar to the objects belonging to other clusters. Clustering mobile transaction data helps in the discovery of social groups, which are used in applications such as targeted advertising, shared data allocation, and personalization of content services. In previous studies, users are typically clustered according to their personal profiles (e.g., age, sex, and occupation). However, in real applications of mobile environments, it is often difficult to obtain users profiles. That is, we may only have access to users mobile transaction data. To achieve the goal of user clustering without user profiles, we need to evaluate the similarities of mobile transaction sequences (MTSs). The previous clustering algorithms are not applicable in the LBS scenario in consideration of the following issues: 1) Most clustering methods in can only process data

with spatial similarity measures, while clustering methods with non spatial similarity measures are required for LBS environments. 2) Most clustering methods in request the users to set up some parameters. However, in real applications, it is difficult to determine the right parameters manually for the clustering tasks. Hence, an automated clustering method is required. Although there exist many non spatial similarity measures like most of them are used to measure the string similarity. However, the mobile transaction sequences discussed in this paper include multiple and heterogeneous information such as time, location, and services. Therefore, the existing measures are not applicable directly for measuring the similarity of mobile transaction sequences.

1.5 Time Interval Segmentation:


The time interval segmentation method helps us find various user behaviors in different time intervals. For example, users may request different services at different times (e.g., day or night) even in the same location. If the time interval factor is not taken into account, some behaviors may be missed during specific time intervals. To find complete mobile behavior patterns, a time interval table is required. Although some studies used a predefined time interval table to mine mobile patterns the data characteristic and data distribution vary in real mobile applications. Therefore, it is difficult to predefine a suitable interval table by users. Automatic time segmentation methods are, thus, required to segment the time dimension in a mobile transaction database.

1.6 Existing System


Mobile patterns are discovered from whole logs. Discovered patterns are not precise enough since there are differentiated mobile behaviors. SMAP-Mine predicts next location and service. MSP predicts next mobile behaviors with specific path. Temporal periods are not considered. No work that considered user clusters and temporal periods simultaneously.

1.7 Proposed System


Cluster-based Temporal Mobile Sequential Pattern Mine (CTMSP-Mine), for discovering CTMSPs in LBS environments. Cluster-Object-based Smart Cluster Affinity Search Technique (CO-Smart-CAST) builds a cluster model for mobile transactions based on the Location-Based Service Alignment (LBSAlignment) similarity measure. The time interval segmentation method finds various user behaviors in different time intervals.

CHAPTER 2 LITERATURE SURVEY 2.1 Introduction


The survey is based upon four categories: 1) Mobile Pattern Mining Techniques 2) Temporal Pattern Mining Techniques 3) Clustering Methods 4) Mobile Behavior Predictions

Mobile Pattern Mining Techniques

The related studies based upon mobile pattern mining techniques are as follows: 2.2.1 Association Rule Mining- R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rule between Sets of Items in Large Databases, Proc. ACM SIGMOD Conf. Management of Data, pp. 207-216, May 1993. Explanation: Mining association rules is proposed to discover the important items in a transaction database. They are simply to find out the frequent patterns, associations, correlations among sets of objects or items in transaction databases and other information repositories. The rules will predict the occurrence of an item based on the occurrences of other items in the transaction. Steps: List all the possible association rules. Compute the support and confidence for each rule. Prune rules that fail the minsup and minconf thresholds. Drawbacks: The main drawback is a single minimum support threshold may not be effective and it is computationally expensive. 2.2.2 Apriori Algorithm- R. Agrawal and R. Srikant, Fast Algorithm for Mining Association Rules, Proc. 20th Intl Conf. Very Large Databases, pp. 478-499, Sept. 1994.

Explanation: Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time a step known as candidate generation, and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. The Apriori algorithm is an efficient algorithm for finding all frequent item sets. The Apriori algorithm implements level-wise search using frequent item property. Drawback: The main drawback is it requires many database scans.

2.2.3 Sequential Mobile Access Patterns (SMAP)- V.S. Tseng and W.C. Lin, Mining
Sequential Mobile Access Patterns Efficiently in Mobile Web Systems, Proc. 19th Intl Conf. Advanced Information Networking and Applications, pp. 867-871, Mar. 2005. Explanation: SMAP-Mine, discover patterns of sequential movement associated with requested services for mobile users in mobile web systems. These patterns are composed of sequential movement associated with requested services. Algorithm: In SMAP-Mine algorithm, two phases are included, namely i) construction of SMAP-Tree and ii) mining of sequential mobile access patterns. The purpose of constructing SMAP-Tree is to aggregate the access patterns into the memory in a compact form so that the mining of large patterns can be done efficiently. The main advantages of SMAP-Tree are 1) only one physical database scan is needed to mine all of the large patterns, and 2) the SMAP-Tree is compact so that the huge amount of data can be handled efficiently. It is used to predict users next location and service. SMAP-Mine discovers sequential mobile access rules and predict the users next locations and services. The form of the rule is {ri, si} -> {rj, sj}with a confidence c, where ri and rj are locations, and si and sj are services. It implies that a user requesting si in ri will have next location and service as rj and sj with c probability. Advantages: They are very useful for wireless applications like data allocation, data replication, location-based personal agent, and context aware and personalized services. Data Allocation: It is referred as filling gaps. It is useful when dealing with data which has different level of granularity that is the size of the data item. Data Replication: It is a method of copying data from a database in one server to a database in another. Location Based Personal Agents: It uses an infrastructure PLACE (Person Location Agent for Communicating Entities) for the purpose of establishing user location with better coverage at varying granularities with better accuracy. Drawbacks: The main drawback is only location and services are predicted and the temporal periods are not considered.

2.2.4 Mobile Sequential Patterns (MSP)- C.H. Yun and M.S. Chen, Mining Mobile Sequential
Patterns in a Mobile Commerce Environment, IEEE Trans. Systems, Man, and Cybernetics, Part C, vol. 37, no. 2, pp. 278-295, Mar. 2007. Explanation: MSP takes both the moving patterns and purchase patterns into consideration. In essence, the mining of mobile sequential patterns aggregates the concepts on mining association rules, mining path traversal patterns, and mining sequential patterns, and thus requires a combined use of corresponding techniques. Three algorithms are devised algorithm TJLS, algorithm TJPT, and algorithm TJPF to determine mobile sequential patterns. MSP predicts the next mobile behaviors. The form of the pattern is{(ri,si),(r1),(r2),(r3),(rj,sj)) where item (ri, si) indicates a user request service si at location ri. It means that a user requests service si in location ri, and then, requests service sj in location rj with the specific path r1r2r2. Procedure: 1) Large-Transaction Generation Phase: Determine the (L-transactions large transactions) from the mobile transaction sequences. For each cell, a modified algorithm named DHP is applied for finding the set of all L-transactions TL. The set of item sets is mapped to a set of contiguous integers for reducing the time required to check if a mobile sequential pattern is contained in a mobile transaction sequence. 2)Large-Transaction Transformation Phase: Employ algorithm Large-Transaction Transformation with Sequence-Trimming (LTTST) to transform all mobile transaction sequences into the maximal L-transaction sequences. In this, the same item sets in different cells are viewed as different transactions. Thus, the same item sets sold in different cells will be transformed to different integers. 3) Sequential-Pattern Generation Phase: Employ one of the following three algorithms [TJLS (Transactionset Join with Large-transaction set), TJPT (Transactionset Join with Path Trimming), and TJPF (Transactionset Join with Pattern Family)] to determine the large sequential patterns from the maximal L-transaction sequences. Algorithm: a) AlgorithmTJLS: Algorithm TJLS is devised as a variant of algorithm a priori in by using a two-level hash tree in mining large sequential patterns. The two-level hash tree, called the mobile sequence tree, to store the candidate sequential patterns. In the two-level hash tree, a node either contains a list of patterns (a leaf node) or a hash table (an internal node). In an internal node, each bucket of the hash table points to another node. The patterns are stored in the leaf nodes. TJLS can join the L-transaction sets to construct the transaction component of the mobile sequence tree in the candidate generation. Algorithm TJLS tends to count the supports of a lot of out-of-path sequential patterns (i.e., the sequential patterns that do not stay within the path), thus degrading the performance. b) Algorithm TJPT : In light of the concept of the path trimming technique, algorithm TJPT is devised by taking the path into consideration in generating the candidate patterns. c) Algorithm TJPF: In light of the concept of the pattern family technique, algorithm TJPF is devised by using the shared-path tree in generating the candidate patterns. It is similar to algorithm TJPT but reduces the computational overhead caused by out-of-path sequential patterns. 4) Sequential-Rule Generation Phase: Derive mobile sequential rules from the large sequential

patterns. Advantages: i)Path Trimming: TJPT not only determines large sequential patterns but also maintains a buffer that contains the leaf nodes in the transaction component and the corresponding paths in the path component so as to classify the patterns. The purpose of classifying the patterns is that the patterns, whose paths do not contain each other, need not be considered to generate candidate sequential patterns together. Thus, TJPT can trim the generation of candidate sequential patterns according to the paths. This is referred to as the path trimming technique. ii) Pattern family: It consists of pattern itself and all its sub patterns generated at each round and reduces the number of uncertain candidate sequential patterns. Drawbacks: The main drawback is Algorithm TJPF did not consider the factors of user cluster and time interval. The complete information of mobile behaviors is not discovered properly.

2.3 Temporal Pattern Mining Techniques:


By knowing the pattern of each time interval it helps to enhance customized service.

2.3.1 T-MAP-S.C. Lee, J. Paik, J. Ok, I. Song, and U.M. Kim, Efficient Mining of User Behaviors by
Temporal Mobile Access Patterns, Intl J. Computer Science Security, vol. 7, no. 2, pp. 285-291, Feb. 2007. Explanation: T-MAP stands for Temporal Mobile Access Patterns. It finds the mobile users mobile access patterns in distinct time intervals. The discovered patterns provide real-time customized personal service for users. But this method is not suitable for all data. Genetic Algorithm: T-MAP lacks flexibility since start time and end time must be set in advance. It does not find the best segmentation points of the time intervals. The genetic algorithm is generally used to solve such problems. The genetic algorithm was proposed by Holland. It needs to define a fitness function to evaluate the quality of a chromosome, and then, randomly generate a population. Operators: There are three operators in Genetic Algorithm: 1) selection 2) crossover 3) mutation 1)Selection: For the selection operator, a proportion of the current population is selected to product the next

population in each generation. Individual chromosomes are selected based on their fitness value. The larger the fitness value of a chromosome, the higher the probability of the chromosome is selected. 2)Crossover: For the crossover operator, we apply one-point crossover that involves a crossover probability to this operator. A crossover point on both parent chromosomes is randomly selected. All time segmenting points beyond the crossover point are swapped between the two parent chromosomes. The resulting chromosomes are the children. 3)Mutation: For the mutation operator, we apply the one-bit mutation to this operator. This operator involves a mutation probability that arbitrary time segmenting point in a chromosome will be changed from its original state.

2.4 Clustering Methods 2.4.1 Clustering Analysis- Data Mining, Concepts and Techniques, Jiawei Han and Micheline
Kamber. The clustering analysis can be roughly divided into two categories. Similarity measures Clustering methods

2.4.1.1 Similarity Measures


The popular similarity measures are Euclidean distance, Edit distance, LCSS and EDR. They are used to measure string sequence or time series data analysis. Drawbacks: The mobile transaction sequences are not only time series movement string but also with service sequences, it is crucial to properly define the similarity between different sequences.

2.4.1.2 Clustering Methods


2.4.1.2.1 k-means Method: It is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. 2.4.1.2.2 k-Mediods Method: It is related to the k-means algorithm and the medoidshift algorithm. In contrast to the k-means algorithm, k-medoids chooses data points as centers. Both of these methods partition the data set into k clusters, based on similarities between data items, where k is a parameter specified by the user. 2.4.1.2.3 Hierarchical Clustering Method: It is a method of cluster analysis which seeks to build a hierarchy of clusters. It generally fall into two types: Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. 2.4.1.2.4 Density-based Clustering Method: Density-based clustering algorithms are devised to discover arbitrary-shaped clusters. In this, a cluster is regarded as a region in which the density of data objects exceeds a threshold.

Drawbacks: The above clustering methods request the users to set up some parameters before the clustering task. For example, DBSCAN requires a density radius r and a minimum number of objects MinPts to be generated. However, in real applications, it is difficult to determine the right parameters manually for the clustering tasks.

2.5 Mobile Behavior Predictions- Y. Tao, C. Faloutsos, D. Papadias, and B. Liu, Prediction and
Indexing of Moving Objects with Unknown Motion Patterns, Proc. ACM SIGMOD Conf. Management of Data, pp. 611-622, June 2004. The mobile behavior predictions can be divided into two categories. 2.5.1 Time Series Based Prediction 2.5.2 Pattern Based Prediction 2.5.1 Time Series Based Prediction It can be divided into two types: a) linear models and b) nonlinear models. The non linear models prediction accuracies are higher than those of the linear models. Recursive Motion Function (RMF) is the most accurate prediction method on regression functions 2.5.2 Pattern Based Prediction It includes a model called Markov Model (MM) that generates Markov transition probabilities from one cell to another for predicting the next cell of the object.

CHAPTER 3

SYSTEM ARCHITECTURE 3.1 Architecture


The overall framework of CTMSP Mining is described as follows:

Fig.3.1 .System Architecture


The major components of CTMSP Mining are: Mobile Database: It is a database that can be connected by a mobile computing device over a mobile network. Mobile Computing: Ability to use a computing capability without a pre defined location. Mobile Network: It is a cellular network distributed over land areas called cells, each served by at least one fixed location called base station. Cell: The communication coverage of each base station is called a cell as a location area. Base Station: A fixed station in a wireless network which is used for communicating mobile phones. Clustering: Clustering is done with a parameter-less clustering algorithm called CO-SMART-CAST. Similarity Matrix must be generated and it is obtained by LBS alignment. Segmentation: In mobile transaction database, similar mobile behaviors exist under some certain time segments. Hence, it is important to make suitable settings for time segmentation. It is done with GA based method to automatically obtain the most suitable time segmentation table with common mobile behaviors. CTMSP Mining: In CTMSP-Mine, both factors of user cluster and time interval are taken into account such that the complete mobile sequential patterns can be discovered. The entire procedures of CTMSP-Mine algorithm can be divided into three main steps:1)Frequent Transaction Mining 2)Mining Transaction Mining and 3)CTMSP Mining.

CHAPTER 4 DATA FLOW DIAGRAM 4.1 LEVEL 0: Fig.4.1.Level 0

The external entity given as input represents the user, where he moves along the mobile network. The process represents the Cluster based Temporal Mobile Sequential Pattern Mining technique. The external entity given as output, displays the information based upon the user movements.

4.2 LEVEL 1:

Fig.4.2. Level 1
The external entity given as input represents the user, where he moves along the mobile network. The process which represents the mobile network, is a cellular network distributed over land areas called cells each served by at least one fixed location called base station. The mobile network database consists of the users movement, location and service. The process which represents clustering clusters the mobile transaction sequences based upon the users different mobile transaction behaviors. The process which represents segmentation discriminates the characteristics of mobile behaviors under different time segments. The process which represents CTMSP Mining considers both factors of user cluster and time interval so that the complete mobile sequential patterns can be discovered. The process which represents prediction strategies predicts the appropriate mobile behavior of users.

4.3 LEVEL 2: Fig.4.3. Level 3


The external entity given as input represents the user, where he moves along the mobile network. Mobile Network: The process which represents the mobile network, is a cellular network distributed over land areas called cells each served by at least one fixed location called base station. The mobile network database consists of the users movement, location and service. The process which represents the mobile transaction describes the users mobile behaviors. The mobile transaction database contains huge amount of mobile transaction logs based upon users mobile behavior. Clustering: The process which represents clustering clusters the mobile transaction sequences based upon the users different mobile transaction behaviors.

The process which represents LBS Alignment is based on the consideration that two mobile transaction sequences are similar, when the orders and timestamps of their mobile transactions are more similar. The process which represents CO-Smart-CAST finds the best clustering result and obtains a near optimal clustering result. The user cluster table possesses the clustering results. Segmentation: The process which represents segmentation discriminates the characteristics of mobile behaviors under different time segments. The process which represents the time interval obtains the number of time segmenting points. The process which represents the genetic algorithm obtains the optimal solution based upon the operators. The time interval table possesses the segmentation results. CTMSP: The process which represents CTMSP Mining considers both factors of user cluster and time interval so that the complete mobile sequential patterns can be discovered. The process which represents mining considers the frequent transaction mining as well as mobile transaction database transformation. Prediction: The process which represents prediction strategies predicts the appropriate mobile behavior of users.

CHAPTER 5 ALGORITHM 5.1 LBS-Alignment Algorithm:


01 Input: Two mobile transaction sequences s and s 02 Output: the similarity between s and s 03 LBS-Alignment(s,s) 04 P<-0.5/(s.length + s.length) /*p is the location penalty*/ 05 M0,0<-0.5 06 Mi,0<-Mi-1,0-p 07 M0,j<-M0,j-1-p 08 For i<-1 to s.length 09 For j<- 1 to s.length 10 If si.location = sj .location 11 TP<-p*|si.time-sj.time|/len/ *time penalty*/ 12 SR<-p*[si.servicessj.service/si.servicesj.service] /*service reward*/ 13 Else 14 Mi,j<-max(Mi-1,j-p,Mi,j-1-p) 15 End For Input data include two mobile transaction sequences. Output data are the similarity between two mobile transaction sequences, with the degrees in the range from 0 to 1. Some parameters are initialized. The base similarity score is set as 0.5.We use dynamic programming to calculate Mi,j. Mi,j indicates the value

of matrix M in column i and row j, where M is the score matrix of LBS-Alignment. In this procedure, if the locations of two transactions are the same, both the time penalty and the service reward are calculated to measure the similarity score. Otherwise, the location penalty is generated to decrease the similarity score.

5.2 The GetNTSP Algorithm:


01 Input: A mobile transaction database D and its time length T 02 Output: The number of time segmenting points Ntsp 03 GetNTSP(D) 04 Cl,s[t]<-0/*All elements in C are initialized as 0*/ 05 N<-{0,0,.,0} 06 NTSP<-0 07 For each mobile transaction(t,l,s) in D 08 For t<- t to|T| 09 Cl,s[t]<-Cl,s[t]+1/*Calculate accumulative count*/ 10 End For The input data are a mobile transaction database D and its time length T. The output data are the number of time segmenting point. For each item, the total number of occurrences are accumulated at each time point. Therefore, an item (location, service) can draw a curve of count distribution. For all curves, the time points are found with the largest change rate. The change rate represents the total number of occurrences for the item at time point i. The count occurrences of all these time points are counted and finds out the satisfied time points whose counts are larger than or equal to the average of all occurrences from these ones, and then, take these satisfied ones as a set of the time point sequence (TPS). In the time point sequence, the average time distance a is calculated between two neighboring time points. The number of neighboring time point pairs are calculated, in which the time distance is higher than average value. The result represents the time segmentation count.

CHAPTER 6 MODULES 6.1 Module list:


Mobile Network and Behavior Generation Clustering of Mobile Transaction Sequences Time Segmentation of Mobile Transaction Sequences Discovery of CTMSPs Mobile Behavior Prediction for Mobile Users

6.2 Module Description:


6.2.1 Mobile Network and Behavior Generation

The mobile network is created, which a cellular network distributed over land areas is called cells, each served by at least one fixed location called base station. Several base stations must be plotted and multiple users move under different base stations. The mobile behavior is monitored for complex activity. Users movement, transaction, and current time, are stored in mobile transaction database. 6.2.2 Clustering of Mobile Transaction Sequences In a mobile transaction database, users in the different user groups may have different mobile transaction behaviors. The first task is to cluster mobile transaction sequences. A parameter-less clustering algorithm CO-Smart-CAST is proposed. Before performing the CO-Smart-CAST, a similarity matrix S must be generated, based on the mobile transaction database. LBS-Alignment obtains the similarity based on the concept of DNA alignment.. After obtaining the similarity matrix, the mobile transaction sequences are clustered by the CO-Smart- CAST algorithm. 6.2.3 Time Segmentation of Mobile Transaction Sequences In a mobile transaction database, similar mobile behaviors exist under some certain time segments. Hence, it is important to make suitable settings for time segmentation so as to discriminate the characteristics of mobile behaviors under different time segments. GA-based method automatically obtains the most suitable time segmentation table with common mobile behaviors. After the number of time segmenting points are obtained, the genetic algorithm to used discover the most suitable time intervals. 6.2.4 Discovery of CTMSPs In CTMSP-Mine, both factors of user cluster and time interval are taken into account such that the complete mobile sequential patterns can be discovered. The entire procedures of CTMSP-Mine algorithm can be divided into three main steps: Frequent- Transaction Mining Mobile Transaction Database Transformation and CTMSP Mining. 6.2.5 Mobile Behavior Prediction for Mobile Users There are three prediction strategies for selecting the appropriate CTMSP to predict the mobile behaviors of users: The patterns are selected only from the corresponding cluster a user belongs to; The patterns are selected only from the time interval corresponding to current time; and The patterns are selected only from the ones that match the users recent mobile behaviors. If there exist more than one pattern that satisfy the above conditions, we select the one with the maximal support.

CHAPTER 7

DESIGN DIAGRAMS 7.1 Overall Use Case Diagram:


User Moving Process

Prediction Engine

Mobile Transaction Database User Clustering Segmentation CTMSP-Mining Prediction Strategy Fig .7.1 Overall Use Case Diagram The user will be moving along the mobile network. It is a cellular network distributed over land areas called cells, each served by at least one fixed location called base station. The mobile transaction database consists of the users moving patterns. Clustering is done with a parameter-less clustering algorithm called CO-SMART-CAST. Similarity Matrix must be generated and it is obtained by LBS alignment. In mobile transaction database, similar mobile behaviors exist under some certain time segments. Hence, it is important to make suitable settings for time segmentation. It is done with GA based method to automatically obtain the most suitable time segmentation table with common mobile behaviors. In CTMSP-Mine, both factors of user cluster and time interval are taken into account such that the complete mobile sequential patterns can be discovered. The entire procedures of CTMSP-Mine algorithm can be divided into three main steps:1)Frequent Transaction Mining 2)Mining Transaction Mining and 3)CTMSP Mining. There are three prediction strategies for selecting the appropriate CTMSP to predict the mobile behaviors of users: 1) The patterns are selected only from the corresponding cluster a user belongs to; 2) The patterns are selected only from the time interval corresponding to current time; and 3) The patterns are selected only from the ones that match the users recent mobile behaviors. Base Station

7.2 Module Explanation:


7.2.1 Mobile Network and Behavior Generation

Create the Mobile Network


Plot Several Base Stations

Plot multiple Users under Base Stations

Mobile Behavior Generation

Activity Stored in Transaction DB

Fig.7.2.1 Mobile Network and Behavior Generation The mobile network is created, which a cellular network distributed over land areas is called cells, each served by at least one fixed location called base station. Several base stations must be plotted and multiple users move under different base stations. The mobile behavior is monitored for complex activity. Users movement, transaction, and current time, are stored in mobile transaction database. 7.2.2 Clustering of Mobile Transaction Sequences

User Similarity: LBS Alignment

User Clustering: CO-Smart-CAST

Use Cluster Table

Fig.7.2.2 Clustering of Mobile Transaction Sequences In a mobile transaction database, users in the different user groups may have different mobile transaction behaviors. The first task to tackle is to cluster mobile transaction sequences. A parameter-less clustering algorithm CO-Smart-CAST is proposed. Before performing the CO-Smart-CAST, a similarity matrix S must be generated, based on the mobile transaction database. A mobile transaction sequence can be viewed as a sequence string, where each element in the string indicates a mobile transaction. The major challenge to tackle is to measure the content similarity between mobile transactions. LBS Alignment: LBS Alignment is based on the consideration that two mobile transaction sequences are more similar, when the orders and timestamps of their mobile transactions are more similar. Based on this concept, the time penalty (TP) and the service reward (SR) in the LBS-Alignment are specified. The base similarity score is set as 0.5. Procedure: Input data include two mobile transaction sequences. Output data are the similarity between two mobile transaction sequences, with the degrees in the range from 0 to 1. Some parameters are initialized. The base similarity score is set as 0.5.We use dynamic programming to calculate Mi,j. Mi,j indicates the value of matrix M in column i and row j, where M is the score matrix of LBS-Alignment. In this procedure, if the locations of two transactions are the same, both the time penalty and the service reward are calculated to measure the similarity score. Otherwise, the location penalty is generated to decrease the similarity score. Location Penalty: Two mobile transactions can be aligned if their locations are the same. Otherwise, a location penalty is generated to decrease their similarity score. Location Penalty = 0.5/ (|s1|+|s2|).(1) Where |s1|, |s2| are the lengths of the sequences. When two sequences are totally different, their similarity score is 0. When two mobile transactions are aligned, their time penalty and service reward are measured. Time Penalty: TP focuses on their time distance. The farther the time distances between them, the larger their time penalty. TP that is generated to decrease their similarity score is defined as follows. Time Penalty = ((|s1 time-s2 time|)/len)..(2) Where len indicates the time length. Service Reward: SR focuses on the similarity of the service requests. The more similar their service requests, the larger their service reward. SR that is generated to increase their similarity score is defined as follows: Service Reward = |s1.services s2.services|/|s1.services s2.services|(3) Co-Smart-Cast: After obtaining the similarity matrix, the mobile transaction sequences are clustered by the proposed COSmart-CAST.

Steps: The CAST method that takes a parameter named affinity threshold t is used as the basic clustering method. A quality validation method, called Huberts Statistics, is used to find the best clustering result. A hierarchical concept is used to reduce the sparse clusters. For a clustering result, the Huberts Statistics is used to measure its quality by taking the similarity matrix and the clustering result as the input. In each clustering result, obj and clu which represents the clustering qualities are measured by original similarity matrix S and the last cluster similarity matrix S respectively. F1 score which is the harmonic mean combines the obj and clu as obj. The main drawback of this way is that many iterations of computation are required. For this reason, the number of computations is by eliminating unnecessary executions, and then, obtains a near-optimal clustering result. That is, a minimal number of CAST executions are performed. 9.2.3 Time Segmentation of Mobile Transaction Sequences

Time Interval Calculation

GA Based Time Segmentation

Time Interval Table

Fig.7.2.3 Time Segmentation of Mobile Transaction Sequences In a mobile transaction database, similar mobile behaviors exist under some certain time segments. Hence, it is important to make suitable settings for time segmentation so as to discriminate the characteristics of mobile behaviors under different time segments. GetNTSP Procedure: The input data are a mobile transaction database D and its time length T. The output data are the number of time segmenting point. For each item, the total number of occurrences is accumulated at each time point. Therefore, an item (location, service) can draw a curve of count distribution. For all curves, the time points are found with the largest change rate. The change rate represents the total number of occurrences for the item at time point i. The count occurrences of all these time points are counted and finds out the satisfied time points whose counts are larger than or equal to the average of all occurrences from these ones, and then, take these satisfied ones as a set of the time point sequence (TPS). In the time point sequence, the average time distance a is calculated between two neighboring time points. The number of neighboring time point pairs are calculated, in which the time distance is higher than average value. The result represents the time segmentation count. Genetic Algorithm: A GA-based method is proposed to automatically obtain the most suitable time segmentation

table with common mobile behaviors. After the number of time segmenting points is obtained, the genetic algorithm is used to discover the most suitable time intervals. Operators: There are three operators in Genetic Algorithm: Selection Crossover and Mutation. i) Selection: For the selection operator, a proportion of the current population is selected to product the next population in each generation. Individual chromosomes are selected based on their fitness value. The larger the fitness value of a chromosome, the higher the probability of the chromosome is selected. ii)Crossover: For the crossover operator, we apply one-point crossover that involves a crossover probability to this operator. A crossover point on both parent chromosomes is randomly selected. All time segmenting points beyond the crossover point are swapped between the two parent chromosomes. The resulting chromosomes are the children. iii)Mutation: For the mutation operator, we apply the one-bit mutation to this operator. This operator involves a mutation probability that arbitrary time segmenting point in a chromosome will be changed from its original state. For any children chromosome, its time segmenting points must be sorted if the orders are not progressively increased. Fitness(X) =..(4) Where Len(X) is the length of X, Nc represents the total no of cells, Ns represents the total no of services, Ti[c,s] represents the count of cell c and service s in time interval Ti. 9.2.4 Discovery of CTMSPs

Frequent Transaction Mining

Mobile Transaction DB Transformation

CTMSP Mining

Fig.7.2.4 Discovery of CTMSPs In CTMSP-Mine, both factors of user cluster and time interval are taken into account such that the complete mobile sequential patterns can be discovered. The entire procedures of CTMSP-Mine algorithm can be divided into three main steps:

i) Frequent- Transaction Mining ii) Mobile Transaction Database Transformation and iii) CTMSP Mining. i)Frequent- Transaction Mining: In this phase, the frequent transactions in each user cluster and time interval are mined by applying modified Apriori algorithm .At first, the support of each cell and service is counted in each user cluster and time interval according to the user cluster and time interval table. The patterns are kept called as frequent 1-transactions, whose support satisfies the user-specified minimal support threshold T SUP. A candidate 2-transaction is generated by joining two frequent 1-transactions if their user clusters, time intervals and cells are the same. Frequent 2-transactions patterns are kept whose support is larger than TSUP. Finally, the same procedures are repeated until no candidate transactions are generated. ii)Mobile Transaction Database Transformation: In this phase, F-Transactions are used to transform each mobile transaction sequence S into a frequent mobile transaction sequence S. Advantages: Service sets can be represented by symbols for efficiently processing Transactions whose support is less than the minimal support threshold can be eliminated to reduce the size of database. iii)CTMSP Mining: In this phase, all the CTMSPs are mined from the frequent mobile transaction database. Frequent 1CTMSPs are obtained in the frequent-transaction mining phase. In the mining algorithm, a two-level tree named Cluster-based Temporal Mobile Sequential Pattern Tree (CTMSP-Tree) is utilized. The internal nodes in the tree store the frequent mobile transactions, and the leaf nodes store the corresponding paths. Moreover, every parent node of a leaf node is designed as a hash table which stores the combinations of user cluster tables and time interval tables.

9.2.5 Mobile Behavior Prediction for Mobile Users

Recent Behaviors Prediction Strategy

Next Behavior

Fig 7.2.5 Mobile Behavior Prediction for Mobile Users There are three prediction strategies for selecting the appropriate CTMSP to predict the mobile behaviors of users:

1) The patterns are selected only from the corresponding cluster a user belongs to; 2) The patterns are selected only from the time interval corresponding to current time; and 3) The patterns are selected only from the ones that match the users recent mobile behaviors. If there exist more than one pattern that satisfy the above conditions, we select the one with the maximal support. The CTMSPs are selected from the corresponding user cluster and time interval. The mobile transaction sequences of user cluster building are known as training sequences. The k-nearest neighbor (KNN) algorithm is a method of classifying users, based on the closest similarities between training sequences and pervious mobile transaction sequences of predicting users. The predicting user is classified by a majority vote of its knearest neighbors. For all training sequences, the Nave KNN method can be used to find the k-nearest neighbors. One of the solutions is training sequence sampling. Two sampling strategies are proposed named Cluster-Based Random Sampling (CBRS) and Cluster-Based Similarity Sampling (CBSS). First, training sequences are clustered, using the CO-Smart-CAST algorithm. In each cluster, the representative sequence is selected randomly by CBRS and the representative sequence with the largest Similar Value is selected by CBSS. Similar Value(Si) = .(5)

CHAPTER 8 SOFTWARE SPECIFICATION JAVA


The concept of Write-once-run-anywhere (known as the Platform independent) is one of the important key feature of java language that makes java as the most powerful language. Not even a single language

is idle to this feature but java is more closer to this feature. The programs written on one platform can run on any platform provided the platform must have the JVM. Sun has defined and supports four editions of Java targeting different application environments and segmented many of its APIs so that they belong to one of the platforms. The platforms are: Java Card for smartcards. Java Platform, Micro Edition (Java ME) targets environments with limited resources. Java Platform, Standard Edition (Java SE) is targeting workstation environments. Java Platform, Enterprise Edition (Java EE) is targeting large distributed enterprise or Internet environments.

JAVA SE6
Java SE includes classes that support the development of Java Web Services and provides the foundation for Java Platform, Enterprise Edition (Java EE). The Java SE application programming interface (API) defines the manner by which an applet or application can make requests to and use functionality available in the compiled Java SE class libraries. The two principal products in the Java SE platform are: Java Development Kit (JDK) and Java SE Runtime Environment (JRE). The JDK is a superset of the JRE, and contains everything that is in the JRE, plus tools such as the compilers and debuggers necessary for developing applets and applications. The Java Runtime Environment (JRE) provides the libraries, the Java Virtual Machine, and other components to run applets and applications written in the Java programming language. Advantages of Running Applications on Java SE 6 Applications run faster on the desktop and servers New 'Dynamic Attach' diagnostics simplify troubleshooting Improved 'native' look and feel across Solaris, Linux, and Windows First Java platform with full support for Windows Vista JavaScript integrated and included with the platform Scripting languages framework extends support for Ruby, Python, and other languages Complete light-weight platform for web services, right out of the box Simplified GUI design and expanded native platform support Full JDBC4 implementation providing improved XML support for Databases Java DB included with the JDK, a free to use and deploy Java Database Sun Developer Services available to help build more robust applications Improved memory usage analysis and leak detection JDBC 4.0 support. Significant library improvements. Improvements to the Java Platform Debug Architecture (JPDA) & JVM Tool Interface.

Benefits in Upgrading Developer Environments to Sun's Java SE 6

SQL Server:
The Structured Query Language (SQL) is the language of databases. Any interaction between a user, program, or server and a database takes place through the use of SQL, even if the actual SQL code is buried deep within a graphical environment.

All major relational databases today (SQL Server, Oracle, Microsoft Access, IBM DB2, and so on) implement the same basic SQL commands. This common language allows database developers to easily migrate between platforms and create links between disparate database environments. Types of JDBC Connectivity with SQL: JDBC-ODBC Bridge plus ODBC driver: The JavaSoft bridge product provides JDBC access via ODBC drivers. Note that ODBC binary code, and in many cases database client code, must be loaded on each client machine that uses this driver. As a result, this kind of driver is most appropriate on a corporate network where client installations are not a major problem, or for application server code written in Java in a three-tier architecture. Native-API partly-Java driver: This kind of driver converts JDBC calls into calls on the client API for Oracle, Sybase, Informix, DB2, or other DBMS. This style of driver requires that some binary code be loaded on each client machine. JDBC-Net pure Java driver: This driver translates JDBC calls into a DBMS-independent net protocol which is then translated to a DBMS protocol by a server. This net server middleware is able to connect its pure Java clients to many different databases. The specific protocol used depends on the vendor. In general, this is the most flexible JDBC alternative. It is likely that all vendors of this solution will provide products suitable for intranet use. In order for these products to also support Internet access, they must handle the additional requirements for security, access through firewalls, and so forth, that the Web imposes. Native-protocol pure Java driver: This kind of driver converts JDBC calls into the network protocol used by DBMSs directly. This allows a direct call from the client machine to the DBMS server and is an excellent solution for intranet access. Since many of these protocols are proprietary, the database vendors themselves will be the primary source. Several database vendors have these in progress.

CHAPTER 9 CONCLUSION
A novel method, named CTMSP-Mine, for discovering CTMSPs in LBS environments is proposed. Furthermore, novel prediction strategies to predict the subsequent user mobile behaviors using the discovered CTMSPs are defined. In CTMSP-Mine, a transaction clustering algorithm named CO-SmartCAST is proposed to form user clusters based on the mobile transactions using the proposed LBSAlignment similarity measurement. Then, the genetic algorithm is utilized to generate the most suitable time intervals. To our best knowledge, this is the first work on mining and prediction of mobile behaviors associated with user clusters and temporal relations.

CHAPTER 10 REFERENCES
[1] R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rule between Sets of Items in Large Databases, Proc. ACM SIGMOD Conf. Management of Data, pp. 207-216, May 1993. [2] R. Agrawal and R. Srikant, Fast Algorithm for Mining Association Rules, Proc. 20th Intl Conf. Very Large Databases, pp. 478-499, Sept. 1994. [3] R. Agrawal and R. Srikant, Mining Sequential Patterns, Proc. Intl Conf. Data Eng., pp. 3-14, Mar. 1995. [4] V.S. Tseng and W.C. Lin, Mining Sequential Mobile Access Patterns Efficiently in Mobile Web Systems, Proc. 19th Intl Conf. Advanced Information Networking and Applications, pp. 867-871, Mar. 2005. [5] C.H. Yun and M.S. Chen, Mining Mobile Sequential Patterns in a Mobile Commerce Environment, IEEE Trans. Systems, Man, and Cybernetics, Part C, vol. 37, no. 2, pp. 278-295, Mar. 2007. [6] Data Mining, Concepts and Techniques, Jiawei Han and Micheline Kamber. [7] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation, Proc. ACM SIGMOD Conf. Management of Data, pp. 1-12, May 2000. [8] R.J. Hathaway and J.C. Bezdek,M.-S. Chen, J.-S. Park, and P.S. Yu, Efficient Data Mining for PathTraversal Patterns, IEEE Trans. Knowledge and Data Eng., vol. 10, no. 2, pp. 209-221, Apr. 1998. [9] S.C. Lee, J. Paik, J. Ok, I. Song, and U.M. Kim, Efficient Mining of User Behaviors by Temporal Mobile Access Patterns, Intl J. Computer Science Security, vol. 7, no. 2, pp. 285-291, Feb. 2007. [10] J.-S. Park, M.-S. Chan, and P.S. Yu, An Effective Hash Based Algorithm for Mining Association Rules, Proc. ACM SIGMOD Conf. Management of Data, pp. 175-186, May 1995.

[11] J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu, Mining Access Patterns Efficiently from Web Logs, Proc. Fourth Pacific Asia Conf. Knowledge Discovery and Data Mining, pp. 396-407, Apr. 2000. [12] V.S. Tseng and C.F. Tsui, Mining Multi-Level and Location- Aware Associated Service Patterns in Mobile Environments, IEEE Trans. Systems, Man and Cybernetics: Part B, vol. 34, no. 6, pp. 24802485, Dec. 2004. [13]V.S. Tseng, H.C. Lu, and C.H. Huang, Mining Temporal Mobile Sequential Patterns in LocationBased Service Environments, Proc. 13th IEEE Intl Conf. Parallel and Distributed Systems, pp. 1-8, Dec. 2007. [14] V.S. Tseng and K.W. Lin, Efficient Mining and Prediction of User Behavior Patterns in Mobile Web Systems, Information and Software Technology, vol. 48, no. 6, pp. 357-369, June 2006.

You might also like