Introduction
Whether you’re a budding developer or a seasoned coder, sentiment analysis is a powerful skill to add to your toolkit. In this comprehensive guide, we’ll explore the ins and outs of sentiment analysis using Python and NLTK. So, buckle up, fire up PyCharm, and let’s dive in!
Understanding Sentiment Analysis
Before we jump into coding, let’s take a moment to understand what sentiment analysis is all about. In a nutshell, sentiment analysis involves determining the emotional tone behind a piece of text. Is it positive, negative, or neutral? This skill is particularly useful in various applications, from social media monitoring to customer feedback analysis.
Setting Up Your Environment
First things first, let’s ensure our development environment is ready to roll. Open up PyCharm and create a new Python project. Make sure you have NLTK installed by running:
pip install nltk matplotlib
Now, let’s get our hands dirty with some real code!
Loading NLTK and Preparing Data
We’ll start by importing the NLTK library and loading a sample dataset. For this guide, we’ll use the classic IMDb movie reviews dataset.
import nltk
from nltk.corpus import movie_reviews
nltk.download('movie_reviews')
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
Preprocessing Text Data
Raw text data needs a bit of cleaning before we can extract meaningful insights. Let’s tokenize the words and apply some stemming to reduce words to their root form.
from nltk import FreqDist
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
nltk.download('stopwords')
all_words = [word.lower() for word in movie_reviews.words()]
all_words = [word for word in all_words if word.isalpha()] # Remove non-alphabetic characters
all_words = [word for word in all_words if word not in stopwords.words('english')] # Remove stopwords
# Stemming
ps = PorterStemmer()
all_words = [ps.stem(word) for word in all_words]
Feature Extraction
Now, let’s prepare our feature set. We’ll use the most common words as features.
word_freq = FreqDist(all_words)
top_words = word_freq.most_common(3000)
word_features = [word for (word, freq) in top_words]
Creating a Feature Set
Our next step is to create a feature set for each review, indicating which of the top words are present.
def find_features(document):
words = set(document)
features = {}
for word in word_features:
features[word] = (word in words)
return features
featuresets = [(find_features(rev), category) for (rev, category) in documents]
Training and Testing the Model
Now comes the exciting part β training our sentiment analysis model using the Naive Bayes classifier.
from nltk import NaiveBayesClassifier
from nltk.classify import accuracy
train_set = featuresets[:1900]
test_set = featuresets[1900:]
classifier = NaiveBayesClassifier.train(train_set)
print("Classifier accuracy percent:", (accuracy(classifier, test_set)) * 100)
Classifier accuracy percent: 75.0
Most Informative Features
dread = True pos : neg = 10.0 : 1.0
mulan = True pos : neg = 10.0 : 1.0
slip = True pos : neg = 10.0 : 1.0
finest = True pos : neg = 8.0 : 1.0
seagal = True neg : pos = 7.4 : 1.0
regard = True pos : neg = 6.9 : 1.0
symbol = True pos : neg = 6.9 : 1.0
inept = True neg : pos = 6.8 : 1.0
damon = True pos : neg = 6.8 : 1.0
anger = True pos : neg = 6.8 : 1.0
terri = True neg : pos = 6.3 : 1.0
flynt = True pos : neg = 6.3 : 1.0
turkey = True neg : pos = 6.1 : 1.0
lame = True neg : pos = 5.6 : 1.0
lebowski = True pos : neg = 5.6 : 1.0
Congratulations! You’ve just trained your first sentiment analysis model using NLTK. But what’s the fun without visualization?
Visualizing Results
Let’s add some visual flair to our analysis by plotting the most informative features.
import matplotlib.pyplot as plt
classifier.show_most_informative_features(15)
# Plotting
word_freq.plot(30, cumulative=False)
plt.show()
Conclusion
And there you have it β a detailed guide on sentiment analysis using NLTK and Python. We’ve covered everything from setting up your environment to training a model and visualizing the results. The journey doesn’t end here β sentiment analysis is a vast field with plenty of room for exploration.
As you celebrate another year of coding and learning, take a moment to appreciate how far you’ve come. Sentiment analysis is just one stepping stone in your Python journey. Here’s to another year of growth, learning, and mastering the art of Pythonic magic!
Now, fire up your PyCharm, experiment with different datasets, and let the world know how you’re using sentiment analysis in your projects.
Also, check out our other playlist Rasa Chatbot, Internet of things, Docker, Python Programming, Machine Learning, Natural Language Processing, MQTT, Tech News, ESP-IDF etc.
Become a member of our social family on youtube here.
Stay tuned and Happy Learning. βπ»π
Happy coding, and may your NLP endeavors be both enlightening and rewarding! β€οΈπ₯π