Hybrid Models for S&P 500 Forecasting & News Sentiment
Explore how integrating FinBERT sentiment and BERTopic modeling improves S&P 500 forecasting during market crises in this financial machine learning study.
Honors Thesis Symposium | May 2026
THESIS PRESENTATION
Advancing Financial Time Series Forecasting: A Comparative Analysis of Hybrid Models Integrating Topic-Augmented Sentiment and Explainable AI
Prabin Bajgai
Honors College | The University of Southern Mississippi
Advisor: Dr. Zhaoxian Zhou
School of Computing Sciences and Computer Engineering
May 2026
Honors College Thesis Symposium
02
THE QUESTION
Can News Headlines Predict the Market?
Can structured news sentiment improve next-day predictions of S&P 500 direction (up or down)?
Headlines move markets in minutes — but existing evidence is mixed
Most studies report one overall accuracy number that hides where a model actually works
Problem: studies change the sentiment method AND the model simultaneously — impossible to isolate what helped
This thesis:
hold the model constant
, vary inputs one layer at a time, test across different market conditions
Prabin Bajgai | University of Southern Mississippi
03
DATA OVERVIEW
16 Years of Markets and Headlines
Market Data
S&P 500 daily prices (2008–2023)
→ 3,445 trading days
1-minute intraday bars for measuring market turbulence
VIX — the "fear gauge" — classifying calm vs. stressed markets
News Data
82,110 financial headlines from 3 sources (after cleaning)
Filtered to broad-market news only: Fed, inflation, GDP, jobs, index-level events
Individual stock headlines excluded
Coverage: 1–2 headlines/day early on, rising to 19/day by 2023
May 2026
Honors College Thesis Symposium
04
SENTIMENT SCORING
Reading the News with FinBERT
<strong>FinBERT:</strong> a language model trained specifically on financial text
Reads full sentences in context — not just individual words
Labels each headline: <span style="color: #48BB78; font-weight: 600;">Positive (+1)</span> | <span style="color: #A0AEC0; font-weight: 600;">Neutral (0)</span> | <span style="color: #F56565; font-weight: 600;">Negative (−1)</span>
<strong>Why not simpler methods?</strong> Word-counting misses context — 'not good' scores as positive. General models trained on movie reviews misread financial language.
<strong>Limitation:</strong> no human-annotated ground truth for these specific headlines
Validation Results
82,110
headlines validated
Mean Confidence
0.815
81.5%
Early period
0.809
Later period
0.817
Stable across 16 years
[POSITIVE]
[NEUTRAL]
[NEGATIVE]
May 2026
Honors College Thesis Symposium
TOPIC MODELING
What Is the News About? BERTopic
Knowing a headline is 'negative' is not enough. Negative about the Fed vs. negative about jobs has very different market implications.
May 2026
Honors College Thesis Symposium
FEATURE SETS
Adding One Ingredient at a Time
May 2026
Honors College Thesis Symposium
METHODOLOGY
Models and How We Test Them
May 2026
Honors College Thesis Symposium
On Average, Nothing Beats a Coin Flip
All results hover near <span style='color:#FC8181; font-weight:700;'>0.50</span> —<br>random guessing territory
DeLong <i>p</i> > 0.35 for ALL pairs —<br><span style='color:#90CDF4; font-weight:700;'>no statistically significant</span> differences
Bootstrap 95% CIs<br>all <span style='color:#E2E8F0; font-weight:700;'>straddle 0.50</span>
But averages can hide the real story...
May 2026
Honors College Thesis Symposium
09
KEY FINDING
The Regime Analysis — Where Sentiment Actually Helps
🏆 Set D is ONLY feature set above 0.50 during crises (0.568)
📉 Sets A & B collapse in high-VIX periods (0.430 / 0.407)
💡 Set D worst in calm markets (0.477) — sentiment adds noise when markets are quiet
Volatility-weighted sentiment helps in stress, hurts in calm
May 2026
Honors College Thesis Symposium
10
CONVERGENT EVIDENCE
The Same Pattern Appears Everywhere
Consistent Findings
Rolling AUC: Set D outperforms Set A around 2011 debt crisis, 2015–16 volatility, 2020 COVID crash, 2022 bear market
Temporal holdout: Set D peaks in volatile 2022 (AUC 0.540), drops in calm 2021 (AUC 0.468)
Simple VIX trading rules all lose money (Sharpe −1.2 to −2.8)
The ML model captures more than just "volatility is high"
⚠ Honest Limitations of This Finding
High-VIX subsample is only 50 days (small sample)
Bootstrap 95% CI: [0.495, 0.714] — lower bound dips below 0.50
Permutation test p = 0.213 — not statistically significant
Exploratory finding — needs more crisis-period data to confirm
May 2026
Honors College Thesis Symposium
11
EXPLAINABILITY
Feature Importance via SHAP
May 2026
Honors College Thesis Symposium
12
CONTRIBUTIONS & TAKEAWAYS
What This Thesis Contributes
Controlled Ablation Design
Isolates what each layer of sentiment structure adds — no confounded comparisons
7-Check Evaluation Framework
Prevents shortcuts weakening many financial ML studies — walk-forward, bootstrap, holdout, statistical tests
Key Insight
A single accuracy score can hide where a model works. A model useless on average may be most valuable during crises.
Regime-aware evaluation — testing separately across market conditions — should be STANDARD PRACTICE in financial ML research.
Where to Go Next
Adaptive models that automatically adjust feature weights based on current market conditions
Test on other assets and longer crisis periods to validate regime-dependence findings
Retrain topic model per evaluation fold to eliminate global-fitting information leakage
May 2026
Honors College Thesis Symposium
APPENDIX A
13
METHODOLOGY DETAIL
Walk-Forward Cross-Validation Explained
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
Train
Test
The model ALWAYS trains on past data and tests on future data — no look-ahead bias.
Training window expands with each fold.
May 2026
Honors College Thesis Symposium
14
STUDY LIMITATIONS
APPENDIX B
Limitations
Multiple Comparisons
Many comparisons tested without formal multiple-testing correction
Single Asset
S&P 500 at daily frequency only — generalizability unknown
BERTopic Leakage
Topic model trained on full timeline — minor information leakage
Artifact Topics
2 of 7 topics are narrow artifacts with limited interpretability
No Transformer Baselines
No Transformer-based forecasting models tested as baselines
Small Crisis Sample
High-VIX subsample is only 50 days — underpowered
No Ground Truth
No human-annotated sentiment ground truth for validation
Concept Drift
Performance drifts over time — deployed model needs periodic retraining
May 2026
Honors College Thesis Symposium
TECHNICAL DETAIL
End-to-End Data Pipeline
15
APPENDIX C
May 2026
Honors College Thesis Symposium
Text Processing
Quantitative Features
Dataset Construction
Model Evaluation
Presented at the University of Southern Mississippi Honors College Thesis Symposium | May 2026
Advisor: Dr. Zhaoxian Zhou | School of Computing Sciences and Computer Engineering
Prabin Bajgai
University of Southern Mississippi
- financial-forecasting
- machine-learning
- sentiment-analysis
- topic-modeling
- nlp
- sp500
- ai