All Projects
NLP
Twitter Sentiment Analysis
6-model NLP pipeline on 74K tweets. DistilBERT fine-tuning achieves 96.6% accuracy. LR+TF-IDF baseline at 85%. LSTM/Bi-LSTM/CNN reach 87–88%. 4-class: Positive, Negative, Neutral, Irrelevant.
96.6%
DistilBERT Accuracy
85%
LR+TF-IDF Accuracy
87–88%
LSTM/CNN
6
Models tested
Dataset
74,682 Twitter tweets, 4-class sentiment
Approach
Classical ML → deep learning → transformer fine-tuning on tweet sentiment
Tech Stack
PythonScikit-learnTensorFlowHuggingFace DistilBERTNLTK
Keywords
DistilBERTLSTMBi-LSTMTF-IDFSentimentTwitterText Classification
Visualizations6 Charts
Deep Dive
End-to-end NLP benchmark on the Twitter Entity Sentiment dataset.
Dataset
- ▸74,682 training tweets + 1,000 validation, 4 sentiment classes
- ▸Preprocessing: lowercase, URL/mention/hashtag removal, stopwords, lemmatization
All 6 Models Compared
| Model | Accuracy | Notes |
|---|---|---|
| LR + BoW | 83% | Count vectorizer baseline |
| LR + TF-IDF | 85% | Sublinear_tf, 50K features, bigrams |
| LSTM | 87% | 128→64 units, SpatialDropout(0.2) |
| Bi-LSTM | 88% | Bidirectional, 128-d embeddings |
| CNN (text) | 88% | Conv1D 256→128 + GlobalMaxPooling |
| DistilBERT | 96.6% | 3 epochs, lr=2e-5, warmup scheduler |
BERT Fine-tuning Details
- ▸Model: DistilBERT-base-uncased
- ▸Batch size: 32, 3 epochs, linear warmup scheduler
- ▸AdamW with weight decay
- ▸Convergence: rapid — most gain in epoch 1
Key Findings
- ▸Classical ML (85%) is competitive with LSTM/CNN (87–88%) at 100× less compute
- ▸Deep learning models plateau around 87–88%; only transformer architecture breaks through to 96.6%
- ▸DistilBERT's pre-trained contextual embeddings handle tweet slang/abbreviations that TF-IDF misses