Professional Documents
Culture Documents
Fig. 1 : Topic Patterns Obviously, each document stream can be transformed into
. a topic-level document stream of the form T DS = {(td1,
3. PROBLEM DEFINITION u1, t1), (td2, u2, t2), , (tdN , uN , tN )}, by extracting
In this section, we give some preliminary notations, define topics for each document according to Definition 3.
several key concepts related to STPs, and formulate the
problem of mining URSTPs to be handled in this paper.[3] In this paper, we pay attention to the correlations among
successive documents published by the same user in a
document stream. A kind of fundamental but important
An important data mining problem is to design algorithm Next, we will formally specify this kind of STPs step by
for discovering hidden patterns in sequences. There have step, starting with the classical concept of support to
been a lot of research on this topic in the field of data describe the frequency. For deterministic sequential pattern
mining and various algorithms have been proposed. mining, the support of a pattern is defined as the number
or pro-portion of the sequences containing in the target
One should note that the subsequence s is not necessarily database, but inapplicable for uncertain sequences like
contiguous in s, but just the order needs to be retained. In topic-level document streams. Instead, the expected
addition, when analyzing the pattern instances in a session, support is appropriate to measure the frequency on
the user component and the time component of documents uncertain sequences, and can be computed by summing up
are irrelevant, so we can omit them in the representation of the occurrence probabilities of in all sequences . In other
a session. Two examples of sessions in this form are given words, it expresses the expected number of sequences
in Table 1. For the STP hz1, z3i, it has one instance in s1 containing . To measure the frequency of STPs, we
with probability 0.5 0.4 = 0.2, and two instances in s2 modify it a little to record the proportion of sessions where
with probabilities 0.3 and 0.12, respectively. occurs also in terms of expectation, via dividing the
summation by the number of sessions. That is necessary
Table 1: Examples of Sessions in Topic-level Document because the session number here is no longer a constant
Streams when we consider both the global frequency and the local
ID Sequences frequency of for different users. For simplicity, this
measure is still denoted as support instead of expected
Seq1 {a,b},{c},{f}, {g},{e}
support in this paper.
Seq2 {a,d},{c},{b}, {a,b,e,f}
Definition 6 (Support of STP): Given a session set S =
Seq3 {a},{b},{f},{e} {s1, s2, , s|S|} as the database and an n-STP = hz1,
Seq4 {b},{f, g, h} z2, , zni, the support of in S is defined as
References
[1]C. C. Aggarwal, Y. Li, J. Wang, and J. Wang, Frequent
Figure 11: Frequent subsequences of employment, higher pattern mining with uncertain data, in Proc. ACM
education, school, further education, joblessness, training SIGKDD09, 2009, pp. 2938.
with minimal support =0.05 [2]R. Agrawal and R. Srikant, Mining sequential
patterns, in Proc. IEEE ICDE95, 1995, pp. 314.
To make a finer comparison between our 2 algorithms, we [3]W. Dou, X. Wang, D. Skau, W. Ribarsky, and M. X.
have a tendency to amplify the lower elements of the 2 Zhou, LeadLine: Interactive visual analysis of text
charts and show them on the proper of several figures. It is data through event identification and exploration, in
discovered that the approximation algorithmic rule is so a Proc. IEEE VAST12, 2012, pp. 93102.
trifle quicker, particularly for larger scales. Notice that [4]Z. Hu, H. Wang, J. Zhu, M. Li, Y. Qiao, and C. Deng,
every execution of this subprocedure is simply for one Discovery of rare sequential topic patterns in
user, thus once the user variety will increase, the time document stream, in Proc. SIAM SDM14, 2014, pp.
distinction for the total approach can become a lot of and a 533541.
lot of evident, even with some extent of correspondence. [5]N. R. Mabroukeh and C. I. Ezeife, A taxonomy of
Therefore, we are able to conclude that the 2 algorithms sequential pattern mining algorithms, ACM Comput.
have their several benefits that one is acceptable for the Surv., vol. 43, no. 1,pp. 3:13:41, 2010.
$64000 task reflects a trade-off between mining accuracy [6]Jiaqi Zhu, Member, IEEE, Kaijun Wang, Yunkun Wu,
and swiftness, and will rely upon the particular needs of Zhongyi Hu, and Hongan Wang, Member, IEEE
application eventualities. Mining User-Aware Rare Sequential Topic
Patterns in Document Streams 2016.
6. CONCLUSION
Mining URSTPs in published document streams on the Author
web could be a important and difficult problem. It P. Manvitha received B.Tech. in
formulates a replacement quite complicated event patterns Computer Science and Engineering
supported document topics, and has wide potential from Jawaharlal Nehru Technological
application eventualities, like period observation on University, Anantapur.
abnormal behaviors of net users. During this paper, many Currently an M.Tech student in
new ideas and also the mining drawback area unit formally Computer Science and Engineering in
outlined, and a bunch of algorithms area unit designed and G.PullaReddy Engineering College,
combined to consistently solve this drawback. The Kurnool.
experiments conducted on each real (Twitter) and artificial
datasets demonstrate that the projected approach is Dr. G. Sunil Vijaya Kumar received
extremely effective and economical in discovering special B.Tech. from SVUCE Tirupathi. And
users further as attention-grabbing and explicable URSTPs M.Tech in Computer Science from
from net document streams, which may well capture users SC&SS JNU, Delhi and Ph.D in
personalized and abnormal behaviors and characteristics. Computer Science from SOCIS,IGNOU
Delhi .Currently working as a professor
As this paper puts forward associate degree innovative in Dept of Computer Science in
analysis direction on net data processing, a lot of work is GPREC, Kurnool.
engineered thereon within the future. At first, the matter
and also the approach also can be applied in different fields
and eventualities, particularly for browsed document
streams we are able to regard readers of documents as