Every day, millions of people leave reviews, post on social media, write support tickets, and send emails. Every one of those messages carries an opinion. Positive or negative. Satisfied or frustrated. Enthusiastic or indifferent.
For a single business, reading all of that manually is impossible. For a global platform with millions of users, it's not even a conversation worth having. The only way to process opinion at scale is to automate it.
That's what sentiment analysis does. It's the NLP task of automatically detecting the emotional tone of a piece of text, whether it's positive, negative, or neutral, and it's one of the most widely deployed NLP capabilities in the world today.
You've encountered it even if you didn't know it. Every time a review platform summarises how users feel about a product, every time a customer service platform flags an angry support ticket for urgent attention, every time a brand tracks how people are reacting to a product launch on social media, that's sentiment analysis running in the background.
What makes this post different from a standalone introduction is that we've now spent nine posts building up exactly the tools that power sentiment analysis. You understand tokenization, embeddings, cosine similarity, and TF-IDF. In this post, we'll show how those pieces connect to solve a real problem, and we'll also cover the modern approaches that push far beyond them.
What Is Sentiment Analysis?
Sentiment analysis is the task of identifying and categorising the emotional tone expressed in a piece of text.
At its simplest, it's a classification problem: given a sentence or document, output a label. The most common labels are:
- Positive: the text expresses a favourable opinion ("This product is fantastic.")
- Negative: the text expresses an unfavourable opinion ("Completely useless, would not recommend.")
- Neutral: the text expresses no strong opinion ("The package arrived on Tuesday.")
More sophisticated systems go further. They might output a score on a continuous scale (how positive is this, from 1 to 5?). They might detect multiple emotions at once (joy, anger, surprise, fear). Or they might analyse sentiment at a more granular level, which is where aspect-based sentiment analysis comes in.
Sentiment analysis in machine learning is framed as a supervised learning task: you train a model on thousands of labelled examples (text plus correct sentiment label), and the model learns to generalise to new text it hasn't seen before.
Approach 1: Using What We Already Know (TF-IDF + Classifier)
Before we look at modern approaches, let's build the simplest possible sentiment analysis system using techniques from earlier in this series.
The idea: convert text into TF-IDF vectors (covered in Bag of Words & TF-IDF), then train a classifier on those vectors to predict sentiment.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Small example dataset
texts = [
"I absolutely loved this product, it works perfectly",
"Terrible quality, broke after one day",
"Great value for money, highly recommend",
"Worst purchase I have ever made",
"Amazing customer service and fast delivery",
"Very disappointed, not what I expected at all",
"Solid product, does exactly what it says",
"Complete waste of money, avoid this",
"Really happy with this, will buy again",
"Poor quality and slow shipping"
]
labels = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0] # 1 = positive, 0 = negative
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
texts, labels, test_size=0.3, random_state=42
)
# Convert text to TF-IDF vectors
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)
# Train a logistic regression classifier
model = LogisticRegression()
model.fit(X_train_vec, y_train)
# Evaluate
predictions = model.predict(X_test_vec)
print(classification_report(y_test, predictions))
# Try on new text
new_texts = ["This is excellent, exactly what I needed",
"Broken on arrival, terrible experience"]
new_vecs = vectorizer.transform(new_texts)
print(model.predict(new_vecs)) # [1, 0]
This works surprisingly well for simple cases. TF-IDF captures which words are present, and the logistic regression learns that words like "loved," "great," "amazing" correlate with positive labels while "terrible," "worst," "disappointing" correlate with negative ones.
But it has the same limitations we saw when we covered TF-IDF: it knows nothing about word meaning or context. A simple unigram TF-IDF model can struggle with phrases like "not great" because it still gives significant weight to the word "great." Adding n-grams can help the model learn phrases such as "not great" directly, but TF-IDF still lacks a true understanding of context, negation, sarcasm, and nuance.
This is the ceiling of the bag-of-words approach for sentiment analysis, and it's exactly why more sophisticated methods were developed.
Approach 2: VADER (Rule-Based Sentiment for Social Media Text)
VADER (Valence Aware Dictionary and sEntiment Reasoner) takes a completely different approach. Instead of learning from labelled training data, it uses a hand-crafted lexicon of words with pre-assigned sentiment scores, combined with rules for handling punctuation, capitalisation, and modifiers.
VADER was specifically designed for social media text: short messages, informal language, heavy punctuation, emoji, abbreviations. It handles things that ML classifiers often miss:
- "GREAT!!!" scores more positive than "great" (caps and exclamation marks boost intensity)
- "not great" scores negative (negation is explicitly handled)
- "kind of good" scores less positive than "good" (degree modifiers reduce intensity)
# pip install vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
sentences = [
"I absolutely LOVE this product!!!",
"This is not bad at all",
"Terrible. Completely useless.",
"It's okay, not great but not awful either",
"Kind of disappointed but the price was fair"
]
for sentence in sentences:
scores = analyzer.polarity_scores(sentence)
print(f"{sentence}")
print(f" neg: {scores['neg']:.2f} neu: {scores['neu']:.2f} pos: {scores['pos']:.2f} compound: {scores['compound']:.2f}\n")
Output:
I absolutely LOVE this product!!!
neg: 0.00 neu: 0.29 pos: 0.71 compound: 0.72
This is not bad at all
neg: 0.00 neu: 0.63 pos: 0.37 compound: 0.43
Terrible. Completely useless.
neg: 0.77 neu: 0.23 pos: 0.00 compound: -0.68
It's okay, not great but not awful either
neg: 0.17 neu: 0.57 pos: 0.26 compound: 0.09
Kind of disappointed but the price was fair
neg: 0.25 neu: 0.56 pos: 0.19 compound: -0.13
The compound score is the overall sentiment: above 0.05 is positive, below -0.05 is negative, in between is neutral. Notice that "not bad at all" correctly scores positive (compound 0.43): VADER understands the double negative.
VADER's big advantage is that it requires no training data and no model. You can use it out of the box on any text. Its limitation is that it's a fixed lexicon. It can't adapt to domain-specific language, and it will struggle on text that uses unusual or evolving terminology.
Approach 3: Embeddings + Classifier (Using What We Learned)
A natural improvement over TF-IDF is to replace the sparse word count vectors with the dense sentence embeddings we covered in the Word Embeddings post. Sentence embeddings capture meaning rather than just word presence, so phrases like "not great" and "great" typically receive different representations that better reflect their semantic differences.
from sentence_transformers import SentenceTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
texts = [
"I absolutely loved this product, it works perfectly",
"Terrible quality, broke after one day",
"Great value for money, highly recommend",
"Worst purchase I have ever made",
"Amazing customer service and fast delivery",
"Very disappointed, not what I expected at all",
"Solid product, does exactly what it says",
"Complete waste of money, avoid this",
"Really happy with this, will buy again",
"Poor quality and slow shipping"
]
labels = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
# Encode with a sentence embedding model
model_embed = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model_embed.encode(texts)
# Train classifier on embeddings
X_train, X_test, y_train, y_test = train_test_split(
embeddings, labels, test_size=0.3, random_state=42
)
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
# Test on new examples
new_texts = [
"Not what I expected at all, very let down",
"Exceeded all my expectations, brilliant"
]
new_embeddings = model_embed.encode(new_texts)
print(classifier.predict(new_embeddings)) # [0, 1]
This pipeline uses everything we've built up in this series: sentence embeddings to represent meaning, cosine similarity operating implicitly inside the embedding space, and a simple classifier on top. Sentence embeddings often capture semantic relationships and negation better than TF-IDF because they encode contextual information rather than simple word counts. However, sentiment performance still depends on the quality of both the embedding model and the classifier trained on top of it.
Approach 4: Transformer Models (State of the Art)
Modern high-performance sentiment analysis systems are typically built on transformer architectures, either as fine-tuned classifiers or as larger instruction-tuned language models. The Hugging Face library makes this accessible with just a few lines of code.
from transformers import pipeline
# Load a pretrained sentiment analysis pipeline
# This uses a BERT-based model fine-tuned on sentiment data
sentiment_pipeline = pipeline(
"sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english"
)
texts = [
"I absolutely loved this, highly recommended!",
"Terrible experience, never buying again.",
"It's decent, nothing special but does the job.",
"Not bad at all, pleasantly surprised.",
"Complete disaster from start to finish."
]
results = sentiment_pipeline(texts)
for text, result in zip(texts, results):
print(f"{result['label']} ({result['score']:.2f}) | {text}")
Output:
POSITIVE (1.00) | I absolutely loved this, highly recommended!
NEGATIVE (1.00) | Terrible experience, never buying again.
POSITIVE (0.73) | It's decent, nothing special but does the job.
POSITIVE (0.97) | Not bad at all, pleasantly surprised.
NEGATIVE (1.00) | Complete disaster from start to finish.
Notice that "Not bad at all, pleasantly surprised" correctly scores positive at 0.97. The model handles negation, hedging, and nuanced phrasing because it was trained on enormous amounts of labelled data using a transformer architecture that reads the full sentence context before deciding on a label.
This approach typically delivers the strongest accuracy among widely used sentiment analysis techniques, although it requires more compute than TF-IDF classifiers or VADER. For many production applications, transformer-based models provide the best balance of accuracy and practicality, while larger instruction-tuned language models are increasingly used when cost and latency permit.
Aspect-Based Sentiment Analysis
Standard sentiment analysis gives you one label for an entire piece of text. But real opinions are often more nuanced than that.
Consider this review: "The battery life is amazing but the camera is really disappointing."
A standard sentiment classifier might return neutral or struggle to decide. But the review clearly contains two distinct opinions: positive about battery life, negative about the camera. Those are two different aspects of the product, and they have opposite sentiments.
Aspect-based sentiment analysis (ABSA) breaks this down. Instead of assigning one sentiment to the whole text, it identifies specific aspects mentioned in the text and assigns a sentiment to each one separately.
The output for that review would be something like:
- battery life → POSITIVE
- camera → NEGATIVE
This is far more useful for product teams, customer service, and competitive analysis than a single aggregate score. A phone manufacturer that knows 85% of reviewers are positive about battery life but only 30% are positive about the camera knows exactly where to focus.
ABSA is more complex to implement than standard sentiment analysis because the model needs to both identify the aspects and classify the sentiment for each one. State-of-the-art approaches use fine-tuned transformer models, often with specialised training data where aspects and their sentiments have been manually labelled.
It's important to note that the following example is only an approximation of aspect-based sentiment analysis. The model is not automatically identifying aspects from the text. Instead, we manually provide candidate aspects and ask a zero-shot classifier to estimate the sentiment associated with each one. True ABSA systems are typically trained specifically to detect aspects and classify sentiment jointly.
# A simplified example using a pipeline approach
# In production you would fine-tune a model on aspect-labelled data
from transformers import pipeline
# Zero-shot classification can approximate aspect sentiment
# by framing each aspect as a hypothesis
classifier = pipeline("zero-shot-classification",
model="facebook/bart-large-mnli")
review = "The battery life is amazing but the camera is really disappointing."
aspects = ["battery life", "camera quality", "screen", "price"]
for aspect in aspects:
result = classifier(
review,
candidate_labels=[f"{aspect} is good", f"{aspect} is bad"],
)
top_label = result['labels'][0]
score = result['scores'][0]
sentiment = "POSITIVE" if "good" in top_label else "NEGATIVE"
print(f"{aspect}: {sentiment} ({score:.2f})")
Output:
battery life: POSITIVE (0.94)
camera quality: NEGATIVE (0.89)
screen: POSITIVE (0.61)
price: POSITIVE (0.54)
In this example, the zero-shot model correctly associates positive sentiment with battery life and negative sentiment with camera quality. Screen and price have lower confidence scores because the review doesn't mention them, so the model is less certain.
Which Approach Should You Use?
Here's a practical guide based on what you're building:
| Situation | Recommended approach |
|---|---|
| Quick prototype, no training data | VADER |
| Short social media text, no training data | VADER |
| You have labelled data, need interpretability | TF-IDF + Logistic Regression |
| You have labelled data, need better accuracy | Sentence embeddings + Classifier |
| Production system, accuracy is critical | Fine-tuned transformer (Hugging Face) |
| Need opinion per product feature, not per review | Aspect-based sentiment analysis |
The honest take: start simple. VADER or a TF-IDF classifier will get you surprisingly far and helps you understand the problem before you reach for a transformer. Move to transformer-based models when you have labelled data and the performance of simpler approaches isn't meeting your needs.
Challenges in Sentiment Analysis
Even modern sentiment systems face difficult cases:
- Sarcasm and irony. A sentence like "Fantastic. Another update that broke everything." contains positive words but expresses a negative opinion.
- Domain-specific language. The same word can carry different sentiment depending on context. "Killer feature" is often positive in technology discussions, while "killer disease" is obviously negative.
- Mixed sentiment. Many texts contain both positive and negative opinions, which is one reason aspect-based sentiment analysis is often useful.
- Class imbalance. Real-world datasets frequently contain far more neutral examples than strongly positive or strongly negative ones, making training and evaluation more challenging.
Where Sentiment Analysis Is Used Today
Sentiment analysis is one of the most deployed NLP tasks in the world. Here's where it shows up at scale:
E-commerce and product reviews. Amazon, Tripadvisor, and Yelp use sentiment analysis to aggregate opinions, surface the most useful reviews, flag concerning content, and give sellers structured feedback on what customers like and dislike.
Customer service. Support platforms automatically classify incoming tickets by sentiment to prioritise angry or urgent messages. A customer who writes "I am absolutely furious" gets routed differently than one who writes "I have a quick question."
Brand monitoring. Companies track mentions on social media and news in real time. If sentiment around a product suddenly drops, the team knows before the PR crisis escalates.
Financial markets. Hedge funds and trading firms run sentiment analysis on news articles, earnings call transcripts, and social media to detect early signals about company performance. Positive sentiment in an earnings call often precedes a stock price move.
Healthcare. Patient feedback forms, online health forums, and discharge surveys are analysed to identify patterns in patient experience without requiring a human to read every response.
What Comes Next
Sentiment analysis is one of the most concrete demonstrations of how the entire series connects. Tokenization prepares the text. TF-IDF or embeddings represent it. A classifier or transformer reads the representation and makes a prediction. Every post in this series contributed a piece.
The next post takes on a different kind of NLP task: Text Classification more broadly, where we'll look at how the same principles extend to categorising documents by topic, intent, or any label you define.
Learn This at Fondra Labs
This post is part of our NLP Foundations series, where we build up practical AI knowledge one concept at a time, from text processing basics all the way to the systems powering modern AI.
At Fondra Labs, we teach AI from production reality, not hype. Every topic in this series is here because it genuinely matters when you sit down to build something real.
If this was useful, explore the rest of the blog. We cover machine learning, deep learning, NLP, and the practical skills that turn understanding into building.
Frequently Asked Questions
Build your AI Foundations
Get the free checklist that production engineers use to avoid these common RAG pitfalls.
Get the Free Guide