You are on page 1of 20

VIETNAM NATIONAL UNIVERSITY, HANOI

UNIVERSITY OF ENGINEERING AND TECHNOLOGY


----------------------------------------

INTERNSHIP REPORT
INFORMATION TECHNOLOGY

TITLE: COLLABORATIVE FILTERING


TECHNIQUES IN RECOMMENDER
SYSTEMS
Supervisor: Assoc. Prof. Dr. H Quang Thy

Student: Mai Cng t


Student ID: 11020067
Group: K56CA (QH2011-CQ-CA)

Hanoi, October 2014

TABLE OF CONTENTS
1

INTRODUCTION.............................................................................................4
1.1 About Knowledge Technology Laboratory...................................................4
1.2 About Topic: Collaborative Filtering techniques in Recommender Systems
........................................................................................................................
4

COLLABORATIVE FILTERING.....................................................................4
2.1 Recommender Systems.................................................................................4
2.2 Collaborative Filtering..................................................................................5
2.2.1

Overview.............................................................................................5

2.2.2

Collaborative Filtering Process...........................................................6

2.3 Collaborative Filtering Algorithms...............................................................6

2.3.1

Cosine Similarity.................................................................................6

2.3.2

Pearson Correlation Similarity............................................................7

2.3.3

Singular Value Decomposition (SVD)................................................7

EXPERIMENT..................................................................................................8
3.1 Recommendation Engine : RecDB...............................................................8
3.2 Experiment....................................................................................................9

CONCLUSION AND FUTUREWORK.........................................................15

REFERENCES................................................................................................15

ACKNOWLEDGMENTS

I would like to express my deep appreciation to Associate Professor, Doctor


Ha Quang Thuy who are supervised and leaded me to complete the
internship process.
I would like to give a big thank to brother and sister in Knowledge
Technology Laboratory (KT-Lab) who are supported me to complete this
report.
I would also like give gratitude to University of Engineering and Technology
that are provided the environment and condition for my learning.
Because time is limited and the condition of this thesis is inevitable
shortcomings, I look forward to the comments of the teacher and the concern
you have with this issue.

TABLE OF FIGURE
Figure 1. Collaborative Filtering Process..............................................................6
Figure 2. Turn on database server........................................................................10
Figure 3. Create and run database movielensdb...............................................11
Figure 4. Import initmovielens1mdatabase.sql....................................................12
Figure 5. Check list of relations...........................................................................12
Figure 6. Top-10 movies recommendation based on the rating predicted using
Item-Item Collaborative Filtering............................................................................14
Figure 7. Recommends the top 5 action movies to user 1...................................15
Figure 8. Recommends the top 5 action movies to user 2...................................16
Figure 9. Recommends the top 5 action movies to user 3...................................17

1 INTRODUCTION
1.1 About Knowledge Technology Laboratory
Knowledge and Technology laboratory is under Faculty of Information
Technology. There are some main fields in researching:
Text Mining, Web Mining, Opinion Mining, Social Media mining,
and Natural Language Processing Vietnamese Entity/Object Search
Vietnamese Entity/Object Search
Process Mining, Knowledge Technology and Service Science
The head of Knowledge and Technology laboratory is Associate Professor,
Doctor Ha Quang Thuy.
1.2 About Topic: Collaborative Filtering techniques in
Recommender Systems
Recommender Systems can be divided into two main categories, Contentbased systems and Collaborative Filtering systems [1] [2] [3]. In my
internship course, I choose the Collaborative Filtering approach, there are
some reasons:
Firstly, Collaborative Filtering is based on simple ideal, so it is easy
to comprehend and implement.
Secondly, although Collaborative Filtering is simple but it is effect
intuition, and using in widely, such as: Amazon.com, Yahoo,
Cinemax.com
Last, Collaborative Filtering is the basic method, it is proven about
the performance, and it can be improved.

2 COLLABORATIVE FILTERING
2.1 Recommender Systems
Recommender Systems are a subclass of Information Filtering system
that use to predict the preference that user would give to an item [1] [4]
(movies, books, music, news, Web page, images ).
Typically, Recommender Systems produce a list of recommendations in
one of two ways: through Collaborative or Content-based Filtering [5] [1]
4

[2]. Collaborative Filtering approaches constructing a model from user


behavior to items in past then use that model to predict items (or rating for
items) that user may have interest in [5] [2]. Content-based Filtering
approaches uses the series of discrete characteristics of an item in order to
recommend additional items with similar properties [2]. In the real system,
these approaches are often combined. It is known as Hybrid Recommender
Systems. The good example of hybrid systems is Netflix. In this report, I am
focus on Collaborative Filtering.
2.2 Collaborative Filtering
2.2.1 Overview
Collaborative Filtering is a technique that automatically predicts the
interest of an active user by collecting rating information from other similar
users or items. The underlying assumption of Collaborative Filtering is that
the active user will prefer those items which the similar users prefer [6].
Collaborative Filtering can be divided into two approaches: Memory-based
and Model-based [2].
The Memory-based approaches (It also known as Nearest Neighbor
Collaborative Filtering or User-based approaches) [5] are the most popular
prediction methods and are widely adopted in commercial Collaborative
Filtering systems [7] [8]. This algorithm utilize the entire user-item database
to generate a prediction, that mean, these systems employ statistical
techniques to find a set of users, known as neighbors, that have a history of
agreeing with the target user (i.e., they either rate different items similarly or
they tend to buy similar sets of items) [5].
Model-based Collaborative Filtering algorithms (also known as Itembased approaches) provide item recommendation by first developing a
model of user ratings. Algorithms in this category take a probabilistic
approach and envision the Collaborative Filtering process as computing the
expected value of a user prediction, given his/her ratings on other items [5].
The Model-based approaches are developed using data mining, machine
learning algorithms to find patterns based on training data, in other words,
training datasets are used to train a predefined model. Model-based
5

approaches can be divided into some category: clustering model, aspect


models, and the latent factor model [1].
There are a number of recommender system that uses both of memory
and model base method. It make more effect for recommendation. Evidently,
it is more complicate in implementation [9]. These system is called Hybrid
Recommender Systems, for example: Recommender System of Google
search
2.2.2 Collaborative Filtering Process

Figure 1. Collaborative Filtering Process

Figure 1 shows the process of the Collaborative Filtering. Collaborative


Filtering algorithms represent the entire m n user-item data as a ratings
matrix

. Each entry

(ratings) of the

ai , j

in

th user on the

represent the preference score

th item. Each individual ratings is

within a numerical scale and it can as well be 0 indicating that the user has
not yet rated that item.
There are many algorithm can be used for Collaborative Filtering. In this
paper, I will focus on Cosine Similarity, Pearson Correlation Similarity,
Singular Value Decomposition.
6

2.3 Collaborative Filtering Algorithms


2.3.1 Cosine Similarity
Cosine Similarity is a Model-based algorithm for making
recommendations [1]. In this algorithm, the similarities between different
items (or users) in the dataset are calculated by using Cosine similarity, and
then this similarity values are used to predict ratings for user-item pairs not
i, j

present in the dataset. In this case, two items

are thought of as two

vectors in the m dimensional user-space. The similarity between them is


measured by computing the cosine of the angle between these two vectors.
i . j
( i, j ) =cos ( i , j )=
i. j

If the value of similarity is 1, two vectors are the same orientation, if that
value is 0, two vector is crossed, item i and j are distinct. And if this value is
-1, two is not similarity.
2.3.2 Pearson Correlation Similarity
Pearson Correlation Similarity is a Model-based algorithm for making
recommendations [1]. In this case, the similarities between two item i , j is
measured by computing Pearson Correlation

corr i , j

(Ru ,i R i)( R u , j R j )

( i, j ) =corr i , j=

u U

(R
u U

Where

Ru ,i

u, i

R i ) .

(R
u U

u, j

R j )

denotes the rating of user

to item

R i

is the

average rating of the i -th item.


The value of ( i, j ) will be between -1 and 1. Values 0, -1 or 1 are very
rarely. That value is somewhere in between those values. The closer the
value of r gets to zero, the greater the variation the data points are around
the line of best fit
7

2.3.3 Singular Value Decomposition (SVD)


Singular Value Decomposition is a matrix factorization technique
commonly used for producing low-rank approximations. Given an matrix
Am n

, with rank

, the singular value decomposition,

SVD ( A)

, is

defined as
SVD ( Am n )=U m m Sm n V T n n

Where matrix

diagonal matrix having only

nonzero entries,

which makes the effective dimensions of these three matrices


r r

m r

, and r n , respectively. U and V are two orthogonal matrices and S

is a diagonal matrix, called the singular matrix.


SVD has an important property that provides the best low-rank linear
approximation of the original matrix

, called

Ak

. It is possible to

retain only k r singular values by discarding other entries. Berry. M et al


[10] and Scott. C et al [11] pointed out that the low-rank approximation of
the original space is better than the original space itself due to the Filtering
out of the small singular values that introduce noise in the customerproduct relationship.
SVD produces a set of uncorrelated eigenvectors. Each customer and
product is represented by its corresponding eigenvector. The process of
dimensionality reduction may help customers who rated similar products to
be mapped into the space spanned by the same eigenvectors.

3 EXPERIMENT
3.1 Recommendation Engine : RecDB
In this section, I am doing some experiment using RecDB Recommendation Engine Built Entirely Inside PostgreSQL 9.2 of Mohamed
Sarwat of University of Minnesota. RecDB allows application developers to
build recommendation applications in a heartbeat through a wide variety of
built-in recommendation algorithms like user-user Collaborative Filtering,
8

item-item Collaborative Filtering, singular value decomposition.


Applications powered by RecDB can produce online and flexible
personalized recommendations to end-users. This engine is free and
available in the website http://www-users.cs.umn.edu/~sarwat/RecDB/.
RecDB has the following main features:
Usability: RecDB is an out-of-the-box tool for web and mobile
developers to implement a myriad of recommendation applications.
The system is easily used and configured so that a novice developer
can define a variety of recommenders that fits the application needs
in few lines of SQL
Seamless Database Integration: Crafted inside PostgreSQL database
engine, RecDB is able to seamlessly integrate the recommendation
functionality with traditional database operations, i.e., SELECT,
PROJECT, JOIN, in the query pipeline to execute ad-hoc
recommendation queries
Scalability and Performance: The system optimizes incoming
recommendation queries (written in SQL) and hence provides near
real-time personalized recommendation to a high number of endusers who expressed their opinions over a large pool of items
By author, RecDB is designed to be run on a Unix operating system. At
least 1GB of RAM is recommended for most queries, though when working
with very large data sets more RAM may be desirable, especially when you
are not working with apriority (materialized) recommenders.
RecDB support 3 algorithms with 5 parameters:
ItemCosCF: Item-Item Collaborative
Similarity measure.
ItemPearCF: Item-Item Collaborative
Correlation Similarity measure.
UserCosCF: User-User Collaborative
Similarity measure.
UserPearCF: User-User Collaborative
Similarity measure.
9

Filtering using Cosine


Filtering using Pearson
Filtering using Cosine
Filtering using Cosine

SVD: Simon Funk Singular Value Decomposition.


To implementation and running RecDB, I have been prepared knowledge
about Linux, PostgreSQL.
In this tool, the author supports two sample database: Movie data from
Movielens, and Geography database. Because time is limited and the
condition of this thesis, I will run my experiment in Movielens database that
publish by the author. Now, I am running demo for RecDB.
3.2 Experiment
Step 1. Turn on database server with command line from terminal in
PosgreSQL folder: perl scripts/pgbackend.pl
If server is available, Terminal look like as follow picture:

Figure 2. Turn on database server

Step 2: Create and run new database has name movielensdb with
command line in new terminal: perl scripts/pgfrontend.pl movielensdb
The address of the host server running the PostgreSQL backend is localhost
(default)
10

Figure 3. Create and run database movielensdb

Step 3: Import the already database into movielensdb:


\i initmovielens1mdatabase.sql;
When import success we have:

11

Figure 4. Import initmovielens1mdatabase.sql

Step 4: Check list of table in Database after importing


We have 6 table in movielensdb: ml_items, ml_items_systemid_seq,
ml_ratings, ml_ratings_ratingid_seq, ml_users, ml_users_systemid_seq

Figure 5. Check list of relations

Step 5: Create Recommenders:


CREATE RECOMMENDER MovieRec ON ml_ratings
USERS FROM userid
ITEMS FROM itemid
EVENTS FROM ratingval
USING ItemCosCF;

12

In this step, I use recommender MovieRec using relation ml_ratings,


ml_ratings(userid,itemid,ratingval) represents the ratings table in a movie
recommendation application, with users from userid and items from
itemid, and using Item-Item Collaborative Filtering using Cosine
Similarity measure. If I change the parameter ItemCosCF to other
parameter, I will user other algorithm, RecDB support 5 parameter:
ItemCosCF, ItemPearCF, UserCosCF, UserPearCF, and SVD. Now, I call
recommend top-10 movies based on the rating predicted using Item-Item
Collaborative Filtering (applying cosine similarity measure) algorithm to
user 1:
SELECT * FROM ml_ratings R
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval
LIMIT 10;
This is result:

13

Figure 6. Top-10 movies recommendation based on the rating predicted using Item-Item
Collaborative Filtering

Now, following query recommends the top 5 action movies to user 1:


SELECT r.itemid, i.name, i.genre, r.ratingval
FROM ml_ratings r, ml_items i
RECOMMEND r.itemid
TO r.userid
ON r.ratingval
USING itemcoscf
WHERE r.userid = 1 AND r.itemid = i.itemid AND i.genre ILIKE '%action
%'
ORDER BY ratingval
14

DESC LIMIT 5;

Figure 7. Recommends the top 5 action movies to user 1

As can be seen from the Figure 7, we can see five action movie with the
highest rating value. That mean, the system can make recommendation for
user 1 five movies in action type.
To compare, I will make top 5 action movies for user 2 and user 3. For
user 2, I user query:
SELECT r.itemid, i.name, i.genre, r.ratingval
FROM ml_ratings r, ml_items i
RECOMMEND r.itemid
TO r.userid
ON r.ratingval
USING itemcoscf
15

WHERE r.userid = 2 AND r.itemid = i.itemid AND i.genre ILIKE '%action


%'
ORDER BY ratingval
DESC LIMIT 5;
And I have the result:

Figure 8. Recommends the top 5 action movies to user 2

And for user 3, I use query :


SELECT r.itemid, i.name, i.genre, r.ratingval
FROM ml_ratings r, ml_items i
RECOMMEND r.itemid
TO r.userid
ON r.ratingval
USING itemcoscf
16

WHERE r.userid = 3 AND r.itemid = i.itemid AND i.genre ILIKE '%action


%'
ORDER BY ratingval
DESC LIMIT 5;
And result:

Figure 9. Recommends the top 5 action movies to user 3

It can be seen from Figure 7, 8, 9 it is very clear that RecDB making


recommend for three users is difference, for example, user 1 is recommended
five movies : Master Ninja I (1984), Mirage (1995), Heaven's Burning
(1997), Big Trees, The (1952), Tough and Deadly (1995). With user 2:
Master Ninja I (1984), Johnny 100 Pesos (1993), African Queen, The (1951),
Diva (1981), Godfather, The (1972). And for user 3: Master Ninja I (1984),
Target (1995), Sea Wolves, The (1980), Born American (1986), Johnny 100
Pesos (1993). That happen is true, because difference user will interested in
17

difference movies. However, we can see that Master Ninja I (1984) is


recommended for three user. That mean, this film has the high estimation.

4 CONCLUSION AND FUTUREWORK


In this report, I introduce recommender systems, recommender systems
based on Collaborative Filtering techniques, and related techniques and
tools. Collaborative Filtering Algorithms (Cosine Similarity, Pearson
Correlation Similarity, and Singular Value Decomposition) had been
described. Moreover, the tools for recommender systems RecDB had been
introduced. Some experiments using the RecDB tools on the movielensdb
(movie data from Movielens) had been described and the results had been
showed.
In the future, I will learn RecDB in more deeply, to understand how it
using Collaborative Filtering to make recommendation. Then I may be
crawled data from facebook.com and using RecDB to make
recommendation.

5 REFERENCES
[1]

Joseph A. Konstan, John Riedl, "Recommender systems: from


algorithms to user experience.," User Model. User-Adapt.
Interact. , vol. 22, no. 1-2, pp. 101-123, 2012.

[2]

Michael D. Ekstrand, John Riedl, Joseph A. Konstan,


"Collaborative Filtering Recommender Systems," Foundations
and Trends in Human-Computer Interaction, vol. 4, no. 2, pp.
175-243, 2011.

[3]

Jiliang Tang, Jie Tang, and Huan Liu, "Recommendation in Social


Media - Recent Advances and New Frontier," A tutorial at KDD,
pp. 24-27, 2014.

[4]

Francesco Ricci and Lior Rokach and Bracha Shapira,


"Introduction to Recommender Systems Handbook," in
Recommender Systems Handbook, 2011, pp. 1-35.
18

[5]

Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John


Riedl, "Item-based collaborative filtering recommendation
algorithms," WWW, pp. 285-295, 2001.

[6]

Hao Ma, Irwin King, Michael R. Lyu, "Effective missing data


prediction for collaborative filtering," SIGIR, pp. 39-46, 2007.

[7]

Greg Linden, Brent Smith, Jeremy York, "Amazon.com


Recommendations: Item-to-Item Collaborative Filtering," IEEE
Comment:
Internet Computing, vol. 7, no. 1, pp. 76-80, 2003.

[8] .
Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter
Bergstrom, John Riedl, "GroupLens: An Open Architecture for
.
Collaborative Filtering of Netnews," CSCW, pp. 175-186, 1994.
.
.
[9]
Mustansar Ali Ghazanfar, Adam Prgel-Bennett, Sndor
.
Szedmk, "Kernel-Mapping Recommender system algorithms,"
.
Inf. Sci., pp. 81-104, 2012.
.
[10].
Berry, M. W., Dumais, S. T., and OBrian, G. W., "Using Linear
Algebra for Intelligent Information Retrieval," SIAM Review, vol.
.
37, no. 4, pp. 573-595, 1995.
.
[11]

Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer,


George W. Furnas, Richard A. Harshman, "Indexing by Latent
Semantic
Analysis,"
JASIS, vol. 41, no. 6, pp. 391-407, 1990.
Mark: .
In words:

Hanoi, ./../2014
Lecturer

19

You might also like