Professional Documents
Culture Documents
Suman Sourav(BE/6030/15)
Mekala Keerthi Niveditha(BE/6034/15)
Project Supervisor
Balaram Mandal
Assistant Professor, CSE Dept, BIT Mesra, Off Campus Deoghar
INTRODUCTION
What is sentiment analysis?
Sentiment Analysis is the process of ‘computationally’ determining whether a piece of writing is positive,
negative or neutral. It’s also known as opinion mining, deriving the opinion or attitude of a speaker.
Natural Language Processing (NLP) is a hotbed of research in data science these days and one of the most
common applications of NLP is sentiment analysis. From opinion polls to creating entire marketing strategies,
this domain has completely reshaped the way businesses work, which is why this is an area every data
scientist must be familiar with.
We used data from Kaggle which was labeled positive, negative and neutral. The data consist of
emoticon, username, and hashtag which is required to remove the preprocessed before using.
Also the second data is collected directly from the live application created on twitter. The data set is
highly complex in this case as it contains likes, pics, data, retweets and all other data related to the
tweets.
Dataset Description
The dataset is collected from the Kaggle.com.
The dataset consist of 17 Rows and 1483 columns. The dataset used is collected from airlines and it is
labeled as positive negative and neutral for training and test data. The data set relevant columns are ID,
AIRLINE_SENTIMENT, and TWEET.
The data is mixture of comments, emoji, URLS, Sentiments, User location and other details.
Number of NEUTRAL comments: 3099
Number of NEGATIVE comments: 9178
Number of POSITIVE comments: 2363
As the data is biased towards negativity the test and train data is chosen accordingly such that the
model is not biased towards any particular sentiment.
The data set of @VirginAmerica is taken for the analysis.
For the live model using tweepy module and tweeter api , the data is directly collected from
the twitter using authentication token and access codes provided in the twitter application. The
collected data is very complex. The tweepy module extracts the tweets, retweets, likes,
comments and other relevant data from the given data set.
Packages Used
• Numpy
• Matplotlib
• Seaborn
• Tweepy
• Textblob
• Pandas
• Nltk
• Sklearn
• Sklearn svc
Algorithms Used
• Text blob package - The text blob uses three parameters for analysis of the sentiment of a
statement:
a. polarity – Sentiment polarity is a verbal representation of the sentiment. It can be
negative, neutral, or positive.
b. Subjectivity- Subjective sentence expresses some personal feelings, views, or beliefs.
c. Intensity - The use of modifiers like very, greatly are used as intensifier. These have
impact on sentiments
• Logistic Regression - Logistic regression is the appropriate way to conduct when the dependent
variable is binary. Like all regressions, the logistic regression is used for predictive analysis. Logistic
regression is used to describe data and explain the relationship between one dependent binary
variable and one or more nominal, ordinal, interval or ratio-level independent variables.
• Support Vector Machines - Support Vector machine can also be used as regression, keeping all the
features intact that characterizes the algorithm. Support vector regression uses the same principle
as the SVM for classification problem, with few adjustments. Overall the idea is to minimize the
error , individualizing the hyperplane which maximizes the margin, keeping in mind that part of
error is considerable.
Model 1
Using Tweepy And TextBlob Package
Steps :
1. Making a twitter application for extracting the live tweets Initially for streaming the tweets.
directly from twitter to our module, we need to create a twitter application. The twitter application
provides us with the proper credentials for logging in into our twitter account and also provides the
authority to stream tweets from the specified twitter handle.
2. In Jupyter Notebook, we import all the necessary packages for the module.
3. Then we write all the necessary functions to stream the data and get the important analysis details
into the program.
Friend list
Likes
Retweets
Length of Tweet
Source of Tweet
4. Visualization of DATA
6. . Next, we wish to fit our data into the predictive model using svm pipelining for taking advantage of
the system multicore. Initially we set the folds for the pipelining and then we then we fit it into the
griv_Svm core.
Result of Model 2
Clearly , we can see that the training score and cross-validation score increases as we increase the
data set training. The result for the data set is given with above accuracy.
Predicting some tweets on our own. The result will be either 1 or 0 depending
on the statement is true or false.
1. Kaggle
(www.kaggle.com)
2. Free code Camp Org
(https://medium.freecodecamp.org/how-to-build-a-twitter-sentiments-analyzer-in-python-using-t
extblob-948e1e8aae14)
3. GeeksforGeeks
(https://www.geeksforgeeks.org/twitter-sentiment-analysis-using-python/)
4. Siraj Raval Youtube Classes
(https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A)
5. Github
(https://github.com)