Whether you speak of Twitter, Goodreads or Amazon — hardly is there a digital space not saturated with peoples’ opinions. In today’s world, it is crucial for organizations to dig into these opinions and get insights about their products or services. However, this data exists in such astounding amounts that gauging it manually is a next to impossible pursuit. This is where Data Science’s yet another boon comes to play — Sentiment Analysis in Python. In this article, we’ll explore what sentiment analysis encompasses and the various ways to implement it in Python.
This article was published as a part of the Data Science Blogathon.
Sentiment Analysis is a use case of Natural Language Processing (NLP) and comes under the category of text classification. To put it simply, Sentiment Analysis involves classifying a text into various sentiments, such as positive or negative, Happy, Sad or Neutral, etc. Thus, the ultimate goal of sentiment analysis is to decipher the underlying mood, emotion, or sentiment of a text. This is also known as Opinion Mining.
Let us look at how a quick google search defines Sentiment Analysis:
Sentiment analysis in Python typically works by employing natural language processing (NLP) techniques to analyze and understand the sentiment expressed in text. The process involves several steps:
Various types of sentiment analysis can be performed, depending on the specific focus and objective of the analysis. Some common types include:
Well, by now I guess we are somewhat accustomed to what sentiment analysis is. But what is its significance and how do organizations benefit from it? Let us try and explore the same with an example. Assume you start a company that sells perfumes on an online platform. You put up a wide range of fragrances out there and soon customers start flooding in. After some time you decide to change the pricing strategy of perfumes — you plan to increase the prices of the popular fragrances and at the same time offer discounts on unpopular ones. Now, in order to determine which fragrances are popular, you start going through customer reviews of all the fragrances. But you’re stuck! They are just so many that you cannot go through them all in one lifetime. This is where sentiment analysis can rope you out of the pit.
You simply gather all the reviews in one place and apply sentiment analysis to it. The following is a schematic representation of sentiment analysis on the reviews of three fragrances of perfumes — Lavender, Rose, and Lemon. (Please note that these reviews might have incorrect spellings, grammar, and punctuations as it is in the real-world scenarios)
From these results, we can clearly see that:
This was just a simple example of how sentiment analysis can help you gain insights into your products/services and help your organization make decisions.
We just saw how sentiment analysis can empower organizations with insights that can help them make data-driven decisions. Now, let’s peep into some more use cases of sentiment analysis:
Python is one of the most powerful tools when it comes to performing data science tasks — it offers a multitude of ways to perform sentiment analysis in Python. The most popular ones are enlisted her
Let’s dive deep into them one by one.
Note: For the purpose of demonstrations of methods 3 & 4 (Using Bag of Words Vectorization-based Models and Using LSTM-based Models) sentiment analysis has been used. It comprises more than 5000 text excrepts labelled as positive, negative or neutal. The dataset lies under the Creative Commons licence.
Text Blob is a Python library for Natural Language Processing. Using Text Blob for sentiment analysis is quite simple. It takes text as an input and can return polarity and subjectivity as outputs.
Polarity determines the sentiment of the text. Its values lie in [-1,1] where -1 denotes a highly negative sentiment and 1 denotes a highly positive sentiment.
Subjectivity determines whether a text input is factual information or a personal opinion. Its value lies between [0,1] where a value closer to 0 denotes a piece of factual information and a value closer to 1 denotes a personal opinion.
Installation:
pip install textblob
Importing Text Blob:
from textblob import TextBlob
Code Implementation for Sentiment Analysis Using Text Blob:
Writing code for sentiment analysis using TextBlob is fairly simple. Just import the TextBlob object and pass the text to be analyzed with appropriate attributes as follows:
Python Code:
Subjectivity of Text 2 is 1.0
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a rule-based sentiment analyzer that has been trained on social media text. Just like Text Blob, its usage in Python is pretty simple. We’ll see its usage in code implementation with an example in a while.
Installation:
pip install vaderSentiment
Importing SentimentIntensityAnalyzer class from Vader:
from
vaderSentiment.vaderSentiment import
SentimentIntensityAnalyzer
Code for Sentiment Analysis Using Vader:
Firstly, we need to create an object of the SentimentIntensityAnalyzer class; then we need to pass the text to the polarity_scores() function of the object as follows:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer sentiment = SentimentIntensityAnalyzer() text_1 = "The book was a perfect balance between wrtiting style and plot." text_2 = "The pizza tastes terrible." sent_1 = sentiment.polarity_scores(text_1) sent_2 = sentiment.polarity_scores(text_2) print("Sentiment of text 1:", sent_1) print("Sentiment of text 2:", sent_2)
Output:
Sentiment of text 1: {'neg': 0.0, 'neu': 0.73, 'pos': 0.27, 'compound': 0.5719} Sentiment of text 2: {'neg': 0.508, 'neu': 0.492, 'pos': 0.0, 'compound': -0.4767}
As we can see, a VaderSentiment object returns a dictionary of sentiment scores for the text to be analyzed.
In the two approaches discussed as yet i.e. Text Blob and Vader, we have simply used Python libraries to perform sentiment analysis. Now we’ll discuss an approach wherein we’ll train our own model for the task. The steps involved in performing sentiment analysis using the Bag of Words Vectorization method are as follows:
Code for Sentiment Analysis using Bag of Words Vectorization Approach:
To build a sentiment analysis in python model using the BOW Vectorization Approach we need a labeled dataset. As stated earlier, the dataset used for this demonstration has been obtained from Kaggle. We have simply used sklearn’s count vectorizer to create the BOW. After, we trained a Multinomial Naive Bayes classifier, for which an accuracy score of 0.84 was obtained.
Dataset can be obtained from here.
#Loading the Dataset import pandas as pd data = pd.read_csv('Finance_data.csv') #Pre-Prcoessing and Bag of Word Vectorization using Count Vectorizer from sklearn.feature_extraction.text import CountVectorizer from nltk.tokenize import RegexpTokenizer token = RegexpTokenizer(r'[a-zA-Z0-9]+') cv = CountVectorizer(stop_words='english',ngram_range = (1,1),tokenizer = token.tokenize) text_counts = cv.fit_transform(data['sentences']) #Splitting the data into trainig and testing from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(text_counts, data['feedback'], test_size=0.25, random_state=5) #Training the model from sklearn.naive_bayes import MultinomialNB MNB = MultinomialNB() MNB.fit(X_train, Y_train) #Caluclating the accuracy score of the model from sklearn import metrics predicted = MNB.predict(X_test) accuracy_score = metrics.accuracy_score(predicted, Y_test) print("Accuracuy Score: ",accuracy_score)
Output:
Accuracuy Score: 0.9111675126903553
The trained classifier can be used to predict the sentiment of any given text input.
Though we were able to obtain a decent accuracy score with the Bag of Words Vectorization method, it might fail to yield the same results when dealing with larger datasets. This gives rise to the need to employ deep learning-based models for the training of the sentiment analysis in python model.
For NLP tasks we generally use RNN-based models since they are designed to deal with sequential data. Here, we’ll train an LSTM (Long Short Term Memory) model using TensorFlow with Keras. The steps to perform sentiment analysis using LSTM-based models are as follows:
Here, we have used the same dataset as we used in the case of the BOW approach. A training accuracy of 0.90 was obtained.
#Importing necessary libraries
import nltk
import pandas as pd
from textblob import Word
from nltk.corpus import stopwords
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from keras.models import Sequential
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split
#Loading the dataset
data = pd.read_csv('Finance_data.csv')
#Pre-Processing the text
def cleaning(df, stop_words):
df['sentences'] = df['sentences'].apply(lambda x: ' '.join(x.lower() for x in x.split()))
# Replacing the digits/numbers
df['sentences'] = df['sentences'].str.replace('d', '')
# Removing stop words
df['sentences'] = df['sentences'].apply(lambda x: ' '.join(x for x in x.split() if x not in stop_words))
# Lemmatization
df['sentences'] = df['sentences'].apply(lambda x: ' '.join([Word(x).lemmatize() for x in x.split()]))
return df
stop_words = stopwords.words('english')
data_cleaned = cleaning(data, stop_words)
#Generating Embeddings using tokenizer
tokenizer = Tokenizer(num_words=500, split=' ')
tokenizer.fit_on_texts(data_cleaned['verified_reviews'].values)
X = tokenizer.texts_to_sequences(data_cleaned['verified_reviews'].values)
X = pad_sequences(X)
#Model Building
model = Sequential()
model.add(Embedding(500, 120, input_length = X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(704, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(352, activation='LeakyReLU'))
model.add(Dense(3, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])
print(model.summary())
#Model Training
model.fit(X_train, y_train, epochs = 20, batch_size=32, verbose =1)
#Model Testing
model.evaluate(X_test,y_test)
Transformer-based models are one of the most advanced Natural Language Processing Techniques. They follow an Encoder-Decoder-based architecture and employ the concepts of self-attention to yield impressive results. Though one can always build a transformer model from scratch, it is quite tedious a task. Thus, we can use pre-trained transformer models available on Hugging Face. Hugging Face is an open-source AI community that offers a multitude of pre-trained models for NLP applications. These models can be used as such or can be fine-tuned for specific tasks.
Installation:
pip install transformers
Importing SentimentIntensityAnalyzer class from Vader:
import transformers
Code for Sentiment Analysis Using Transformer based models:
To perform any task using transformers, we first need to import the pipeline function from transformers. Then, an object of the pipeline function is created and the task to be performed is passed as an argument (i.e sentiment analysis in our case). We can also specify the model that we need to use to perform the task. Here, since we have not mentioned the model to be used, the distillery-base-uncased-finetuned-sst-2-English mode is used by default for sentiment analysis. You can check out the list of available tasks and models here.
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["It was the best of times.", "t was the worst of times."]
sentiment_pipeline(data)
[{'label': 'POSITIVE', 'score': 0.999457061290741}, {'label': 'NEGATIVE', 'score': 0.9987301230430603}]
In this age when users can express their viewpoints effortlessly and data is generated in superfluity in just fractions of seconds — drawing insights from such data is vital for organizations to make efficient decisions. Sentiment Analysis in Python proves to be the missing piece of the puzzle!
By now we have covered in great detail what exactly sentiment analysis entails and the various methods one can use to perform it in Python. But these were just some rudimentary demonstrations — you must surely go ahead and fiddle with the models and try them out on your own data.
A. Sentiment analysis means extracting and determining a text’s sentiment or emotional tone, such as positive, negative, or neutral.
A. Sentiment analysis helps with social media posts, customer reviews, or news articles. For example, analyzing Twitter data to determine the overall sentiment towards a particular product or tracking customer sentiment in online reviews.
A. The two types of sentiment analysis are (1) Document-level sentiment analysis, which analyzes the sentiment of an entire document, and (2) Sentence-level sentiment analysis, which focuses on analyzing the sentiment of individual sentences within a document.
A. Sentiment analysis is analyzing and classifying the sentiment expressed in text. It can be categorized into document-level and sentence-level sentiment analysis, where the former analyzes the sentiment of a whole document, and the latter focuses on the sentiment of individual sentences.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,