You are on page 1of 22

PROJECT REPORT ON

Predicting the features affecting the


helpfulness of online
customer reviews
Abstract
Online Customer Reviews are increasingly available online for a wide variety of
products and services. They helped online buying customers to know about
weaknesses and strengths of different products. But even for an average popular
product many reviews constantly being posted on e-commerce websites. This
makes it difficult for a buyer to go through all the reviews before going to
purchase the product. In this research we developed models which predicts the
helpfulness of a review by certain features like polarity, rating, title sentiment etc.
Our results show that reviews having title polarity and review text polarity, ratings
receive more helpfulness. The length of a review i.e. average word length have
positive influence on its helpfulness.
Table of Contents
1 Introduction
1.1 Motivation
1.2 Scope
1.3 Problem Definition
2 Literature Survey
3 Plan Of Work
3.1 Proposed Approach
3.2 Flow Chart
3.3 Data Collection
3.4 Feature Extraction
3.5 Training and Testing Classifier
3.6 Software Requirements
3.7 Activity Time Chart
4 Experimental Results and Analysis
5 Observations and Conclusions
6 Future Scope
7 References
1.Introduction

1.1 Motivation

Online Customer reviews have become today's word of mouth for the current
generation of buyers and sellers. Online customer reviews influence both product
sales in consumer decision-makingand quality improvement by business firms.It is
predicted that the question“Was this review helpful to you?”brings in about$2.7
billion additional profit to Amazon.com(Spool,2009). Since thousands of reviews
have been constantly posted even for a moderately popular product, every
review will not get a fair chance of getting viewed.
Hence we decided to get the features which makes a review helpful, so when the
reviews are sortedbased on these features values, best reviews will get a fair
chance of being viewed by the customers.

1.2 Scope

This project can be useful to those ecommerce websites which have a big data of
reviews for their products and need to find the best reviews for the customers
available from those. Ourmodel determines how much a particular factor
influences on the helpfulness of a review so that the reviews can be sorted based
on those results. Many ecommerce websites for suppose amazon, have reviews
sorted based on helpfulness which are calculated by the question “was this
review helpful to you?” and also based on time. By using this they can sort the
best reviews by letting every review to get a fair chance to be viewed.
1.3 Problem definition

We are designing a model that determines how much a particular factor


influences on the helpfulness of a review.Application of this model will be helpful
to both customers as well as sellers.
Our goal for this project is to find the features which makes a review helpful and
different from unhelpful review. We are planning on achieving this by
implementing some algorithms on the review data for feature extraction and
using Gradient Boosting technique in getting results for the extracted features.

2. Literature Survey
Some research papers, survey papers and articles based on topic related to one
class data transfer learning have been studied and their respective advantages
and disadvantages have been described (as in Table No. 1). Based on the
Literature Survey a suitable implementation has been selected as the main
motivation for the project.
Literature Survey Table (Table No. 1)
Title Year Source Author Approach used Dataset used

Predicting the 2016 Elsevier Jyoti Prakash They have Amazon


helpfulness of online Singh, Seda considered 19 database
customer reviews Irani, Nripendra factors for finding
P. Rana , the helpfulness of
Yogesh K. the consumer
Dwivedi ,
reviews. Those
Sunil Saumya,
Pradeep Kumar factors are
Roy. mentioned below.

Predicting the 2015 Elsevier Mohammad Readership level is Amazon


performance of online Salehan , Dan J. more for reviews database
customer reviews Kim Containing more
positive polarity in
the title and longer
reviews have more
readership

Reviews 2012 The Online Alaa They classified Amazon


ClassificationUsing Journal on Hamouda, reviews based on database
SentiWordNet Computer Mohamed polarity. They
Lexicon Science and Rohaim used sentiwordnet
Information library in python.
Technology
(OJCSIT)
3. Plan Of Work

3.1 Proposed Approach

Cleaning of data from garbage and misplaced values to avoid noise in results
Data Extraction from datasets to excel sheets i.e from json objects to spread
sheets (csv files)
Finding text polarity using lexical analysis based on sentiwordnet which is a lexical
resource for option mining
Extracting features like adjectives, nouns, average word length, average sentence
length using nltk library in python
Extracting remaining features like rating, helpfulness values directly from the data
Finding features values for dale-chall and flesh readability using formulaes
Implementing Gradient Boosting classifier to get results for all the extracted
feature values
Implementing Random Forest classifier to get results
Comparing and analyzing the results from above two techniques

We have divided the entire work into three phases:


i) Data Collection
ii) Feature Extraction
iii) Training and Testing Classifier
3.2 Flow Chart :

Source Data

Feature Extraction

Training
Classifier

Testing
Classifier

Analysing
Results
3.3 Data Collection :
We have collected data of Amazon.com of online consumer reviews
(electronic products).
Source : http://jmcauley.ucsd.edu/data/amazon/

The collected dataset looks like:


{"reviewerID": "A000008615DZQRRI946FO", "asin": "B005FYPK9C",
"reviewerName": "mj waldon", "helpful": [0, 0], "reviewText": "I was
sketchy at first about these but once you wear them for a couple hours
they break in they fit good on my board an have little wear from skating in
them. They are a little heavy but won't get eaten up as bad by your grip
tape like poser dc shoes.", "overall": 5.0, "summary": "great
buy","unixReviewTime": 1357603200, "reviewTime": "01 8, 2013"}

3.4 Feature Extraction:


We have extracted the following features from the collected data:
1. Review Polarity
2. Title polarity
3. Ratings
4. Dalechall Readability
5. Flesch Readability Index
6. Keywords in review
7. Adjectives
8. Nouns
9. Average word length
10.Average sentence length
11.Exclamation and question marks
12.Capital words
13.Helpfulness
1) Review Polarity :
Every sentence in a review is tokenized to words and each word is
lemmatized using nltk lemmatizer .Then Every lemmatized word is tagged
with its parts of speech using nltk.pos_tag ().The score of each word is
extracted by using senti synsets of sentiwordnet library. Senti_synsets
basically gives the synonyms of each word. Polarity of each word is given by
adding synset.pos_score() and synset.neg_score(). Senti synsets gives
positive and negative score for every word. Sentiment of every sentence is
found by adding the polarity score of each word. By adding the sentiment
score of every sentence we get the polarity score of each review.

2) Title Polarity :
Title polarity is calculated in the same way as review polarity is calculated.

3) Ratings :
Ratings of each review are directly obtained from the dataset.

4) Dalechall Readability :
DaleChall value is a value given to a text which signifies how difficult
a sentence is to read and understand. Dalechall uses 3000 words and any
word which is not there in the list is considered as difficult word .

DaleChall readability value=


(0.1579*(difficult-words/words)*100) + (0.0496*(words/sentences))
Where difficult-words = words that are not in the 3000 word list
Words and sentences are the total words and sentences in the
review.
If the score we get is above 5 then we add 3.6365 to the above score .
The more the score the difficult it is to read.

5) Flesch readability ease :

Flesch value indicates how difficult a text is to read . The more the
value the easier it is easy to read.

FRES = 206.835 – 1.015*(total-words/total-sentences) – 84.6*(total-


syllables/total-words)
6) Keywords :
We have made a list of words that consists of features of products
like
Camera, display, processor, ram etc., which are used to describe the
features of products. It also consists of different brand names like Samsung,
iphone, Dell etc. If these terms are there in the review,then the review might
explain the features of the product and may be comparing the product with
another brand. So we considered this as a feature for finding helpfulness of a
review because the review consisting of these words exactly describes the
product.
More the number of these keywords in the review, the review might be
more helpful.

7) Adjectives and Nouns :


Every sentence in the review is tokenized into words using
nltk.word_tokenize(), then after lemmatizing every word is tagged with its
respective parts of speech using nltk and words tagged with adjectives and
nouns arecounted.

8) Avg word length and sentence length :

Avg Word length = sum of lengths of all words / number of words.

Avg Sentence length = sum of sentence lengths / number of


sentences.

9) Exclamation and Question marks :


They are counted normally by comparing every character in a review.

10)Capital words :
Number of capital words in the review are calculated using
isUpper() method of strings.

11)Helpfulness :
If a review is helpful it is marked as 1 and if it is not helpful it is
marked as -1. Helpfulness is calculated with helpful column for each review.
This column has data like [a , b] , where a represents helpfulness votes and
b represents total number of votes. If ratio of a to b is greater than 0.7 it is
considered as helpful(1) otherwise it is unhelpful(-1).
3.5 Training and testing classifier :
Total features extracted : 14
Total data samples : 6500
Helpful reviews : 4500
Unhelpful reviews : 2000

Classifier Used : Ensemble Gradient boosting Classifier

Ensemble Learning :
In Ensemble Learing different weak learners are combined to predict the
output which is more accurate than those weak models.
Boosting :
Boosting is a technique in which a initial classification is made by the model
And the errors observed in that model are given more weightage ,correct ones
are given less weightage and sent for the next classification and this goes on
.Finally all classifiers are given some weight to predict the output.
Gradient Boosting :
Gradient boosting is an ensembling technique, which means that prediction
is done by an ensemble of simpler estimators. The aim of gradient boosting is to
create (or "train") an ensemble of trees, given that we know how to train a single
decision tree. This technique is called boosting because we expect an ensemble to
work much better than a single estimator.

We used Gradient boost classifier for finding the influence of a variable in the
helpfulness of a review. For implementing Gradient boost classifier we have used
scikit learn which is a python library for implementing the machine learning
algorithms.

sklearn.ensemble.GradientBoostingClassifier(loss='deviance', learning_rate=0.1,
n_estimators=100, min_samples_split=2, min_weight_fraction_leaf=0.0,
max_depth=3, random_state=None, max_features=None,
max_leaf_nodes=None)
Parameters :
n_estimators :it is the no of boosting iterations in the gradient boosting classifier.

max_depth : The height of the decision trees is limited by this parameter in


gradient boosting classifier.

min_samples_split : number of samples split made at a node in decision tree

We have trained the classifier with different parameter values. We have fixed the
parameter values such that when we test our classifier with the same training
data, it gives 100 percent accuracy.

# giving training data here


X_train=X[2000:6550]
Y_train =Y[2000:6550]
#giving testing data here
X_test=X[:2000]
Y_test = Y[:2000]

parameters = {'n_estimators': 2500, 'max_depth': 8, 'min_samples_split': 2,


'learning_rate': 0.01'}
classifieR = ensemble.GradientBoostingClassifier(**parameters)

# training classifier here


classifieR.fit(X_train, y_train)

here fit() trains the classifier with the training data

Random Forest classifier :

It is a bagging technique using decision trees in ensemble learning and it


gives the output that is given as same by many decision trees.

3.6 Software Requirements :


Programming Language used: Python
Libraries used: Nltk, numpy, math, pandas, matplotlib, sklearn, scipy
Any code editor
Operating System: Ubuntu

Collection of data sets and selecting suitable techniques for feature extraction
and classification.
Cleaning of data from garbage and misplaced valuesto avoid noise in results
Data Extraction from datasets to excel sheets i.e from json objects to spread
sheets (csv files)
Finding text polarity using lexical analysis based on sentiwordnet which is a lexical
resource for option mining
Extracting features like adjectives, nouns, average word length, average sentence
length using nltk library in python
Extracting remaining features like rating, helpfulness values directly from the data
Finding features values for dale-chall and flesh readability using formulaes
Implementing Gradient Boosting classifier to get results for all the extracted
feature values
Implementing Random Forest classifier to get results
Comparing and analyzing the results from above two techniques

4. Experimental Results and Analysis:


Accuracy is found by using metrics.accuracy method
Feature importance is found using
ensemble.GradientBoostingClassifier.featureimportances
When max depth = 8 and boosting iterations = 2500
When max depth = 4 and boosting iterations = 2500
When max depth = 4 and boosting iterations = 100

When max depth = 4 and boosting iterations = 500


5. Observations and Conclusions
In deviance graph, the curve of training is decreasing at every boosting iteration because
the model is learning from its errors at every iteration.
Similarly for testing, the graph decreases after every iteration.
Feature importance: Variable importance is found based on the number of times a
particular feature is used for splitting a node in the tree and giving weightage by the
square of the improvement given by each split.
We have executed the algorithm for different boosting iterations and different max-
depth with those 14 features and observed that helpfulness of a review is mostly
affected by polarity, title polarity, rating, average word length. The accuracy of our
model is 80% and we have cross validated with random forest classifier and accuracy is
almost same. We observed that Dalechall readability, nouns, adjectives, question and
exclamation are not affecting the helpfulness at a major range and flesch readability
index is moderately affecting the result.

6. Future Work
Our future scope is to use a better technique to find sentiment analysis for
polarity on text. As our lexical based approach can’t perform on sarcastic
sentences. Our work is limited to English language words. If a word of other
language is written in English like bakwas, mast etc.. it is unable to guess the word
due to lack of knowledge about other language. We can make use of other
language words if they are written in their respective scripts as it is easier to guess
its polarity by using their dictionary directly. So in future we will find a solution to
find polarity for such type of words which are written in English but mean to other
language.

6.References
[1]Singh, J.P., et al., Predicting the “helpfulness” of online consumer reviews,
Journal ofBusinessResearch(2016),
http://dx.doi.org/10.1016/j.jbusres.2016.08.008
[2]Mohammad Salehana, Dan J. Kim, Predicting the performance of online
consumer reviews: A sentimentmining approach to big data analytics (2015),
http://dx.doi.org/10.1016/j.dss.2015.10.006
[3]Alaa Hamouda, Mohamed Rohaim, Reviews ClassificationUsing SentiWordNet
Lexicon (2012), www.academia.edu/download/8292326/123.pdf

[4]http://scikitlearn.org/stable/modules/generated/sklearn.ensemble.GradientBo
ostingClassifier.html

[5]https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-
tuning-gradient-boosting-gbm-python/
[6]http://scikit-
learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html

[7]https://pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/

[8]Spool, J. (2009). The magic behind Amazon's 2.7 billion dollar question.
Available online at
http://www.uie.com/articles/magicbehindamazon/2009 (Accessed on 15th May
2016)

You might also like