Made byBobr AI

Hybrid Models for S&P 500 Forecasting & News Sentiment

Explore how integrating FinBERT sentiment and BERTopic modeling improves S&P 500 forecasting during market crises in this financial machine learning study.

#financial-forecasting#machine-learning#sentiment-analysis#topic-modeling#nlp#sp500#ai
Watch
Pitch
USM Logo
Honors Thesis Symposium | May 2026
THESIS PRESENTATION
Advancing Financial Time Series Forecasting: A Comparative Analysis of Hybrid Models Integrating Topic-Augmented Sentiment and Explainable AI
Prabin Bajgai
Honors College | The University of Southern Mississippi
Advisor: Dr. Zhaoxian Zhou
School of Computing Sciences and Computer Engineering
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
02
THE QUESTION
Can News Headlines Predict the Market?
Bullet
Can structured news sentiment improve next-day predictions of S&P 500 direction (up or down)?
Bullet
Headlines move markets in minutes — but existing evidence is mixed
Bullet
Most studies report one overall accuracy number that hides where a model actually works
Bullet
Problem: studies change the sentiment method AND the model simultaneously — impossible to isolate what helped
Bullet
This thesis: hold the model constant, vary inputs one layer at a time, test across different market conditions
Prabin Bajgai | University of Southern Mississippi
Made byBobr AI
USM Logo
03
DATA OVERVIEW
16 Years of Markets and Headlines
Market Data
📈
S&P 500 daily prices (2008–2023)
→ 3,445 trading days
📊
1-minute intraday bars for measuring market turbulence
😨
VIX — the "fear gauge" — classifying calm vs. stressed markets
News Data
📰
82,110 financial headlines from 3 sources (after cleaning)
🔍
Filtered to broad-market news only: Fed, inflation, GDP, jobs, index-level events
Individual stock headlines excluded
📅
Coverage: 1–2 headlines/day early on, rising to 19/day by 2023
2008
Financial Crisis
2012
European Debt
2016
Brexit / Elections
2020
Pandemic Crash
2023
Inflation / Rates
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
04
SENTIMENT SCORING
Reading the News with FinBERT
FinBERT: a language model trained specifically on financial text
Reads full sentences in context — not just individual words
Labels each headline: Positive (+1) | Neutral (0) | Negative (−1)
Why not simpler methods? Word-counting misses context — 'not good' scores as positive. General models trained on movie reviews misread financial language.
Limitation: no human-annotated ground truth for these specific headlines
Validation Results
82,110
headlines validated
Mean Confidence
0.815
Early period 0.809
Later period 0.817
* Stable across 16 years
[POSITIVE]
[NEUTRAL]
[NEGATIVE]
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
05
TOPIC MODELING
What Is the News About? BERTopic
"Knowing a headline is 'negative' is not enough. Negative about the Fed vs. negative about jobs has very different market implications."
Numerical Fingerprint
Represent each headline as a vector
Cluster
Group similar headlines together
Label
Name each cluster with distinctive keywords
7 Topics Discovered
  • 1.
    General market movement — 85.6%
  • 2.
    Fed policy
  • 3.
    Employment
  • 4.
    Trade war
  • 5.
    Credit ratings
  • 6–7.
    Small artifact topics (<1% combined)
Topic Quality (NPMI Score)
BERTopic
0.344
LDA (classic)
0.039
37% of outlier headlines reassigned to nearest topic
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
06
FEATURE SETS
Adding One Ingredient at a Time
SET D
+ Volatility-Weighted Topic Sentiment
Same as C but amplified on turbulent days, dampened on calm days
59 features
▲ +
SET C
+ Topic-Structured Sentiment
Per-topic sentiment scores, topic probabilities, topic diversity
45 features
▲ +
SET B
+ Basic Sentiment
Lagged daily sentiment, sentiment moving averages, sentiment volatility
30 features
▲ +
SET A
Price Data Only
Moving averages, momentum, volatility — no text
23 features
Ablation Design — This controlled, layer-by-layer approach isolates each ingredient's contribution to forecasting performance.
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
07
METHODOLOGY
Models and How We Test Them
AI Icon
Models Tested
Gradient-Boosted Trees
LightGBM, XGBoost, CatBoost
"Hundreds of small decision trees, each correcting the last"
Recurrent Neural Networks
LSTM, GRU
"Process data as ordered sequences with memory of previous days"
Baselines
Always-predict-up
Logistic regression
Random forest
Check Icon
7-Check Evaluation Framework
1
Walk-forward cross-validation: train past → test future (5 folds)
2
Timing robustness: shift sentiment +1 day (leakage check)
3
Nested CV: prevent tuning from inflating results
4
Statistical tests: DeLong & Diebold-Mariano
5
Rolling 1-year AUC: track when model works vs. fails
6
Block bootstrap: confidence intervals preserving time structure
7
Temporal holdout: freeze after 2020, predict 2021–2023
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
08
AGGREGATE RESULTS
On Average, Nothing Beats a Coin Flip
Model Set A Set B Set C Set D
0.50 = Random Guessing
LightGBM 0.483 0.494 0.479 0.483
CatBoost 0.489 0.493 0.480 0.494
XGBoost 0.484 0.489 0.489 0.491
All results hover near 0.50
random guessing territory
DeLong p > 0.35 for ALL pairs —
no statistically significant differences
Bootstrap 95% CIs
all straddle 0.50
But averages can hide the real story...
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
09
KEY FINDING
The Regime Analysis — Where Sentiment Actually Helps
Dataset Overall AUC Low VIX (<20) Medium VIX (20–30) High VIX (≥30)
Set A 0.485 0.489 0.507 0.430
Set B 0.510 0.509 0.539 0.407
Set C 0.513 0.487 0.554 0.517
Set D 0.512 0.477 0.561 ⭐ 0.568
🏆 Set D is ONLY feature set above 0.50 during crises (0.568)
📉 Sets A & B collapse in high-VIX periods (0.430 / 0.407)
💡 Set D worst in calm markets (0.477) — sentiment adds noise when markets are quiet
Volatility-weighted sentiment helps in stress, hurts in calm
May 2026
Honors College Thesis Symposium
Made byBobr AI
10
USM Logo
CONVERGENT EVIDENCE
The Same Pattern Appears Everywhere
Consistent Findings
Rolling AUC: Set D outperforms Set A around 2011 debt crisis, 2015–16 volatility, 2020 COVID crash, 2022 bear market
Temporal holdout: Set D peaks in volatile 2022 (AUC 0.540), drops in calm 2021 (AUC 0.468)
Simple VIX trading rules all lose money (Sharpe −1.2 to −2.8)
The ML model captures more than just "volatility is high"
⚠ Honest Limitations of This Finding
High-VIX subsample is only 50 days (small sample)
Bootstrap 95% CI: [0.495, 0.714] — lower bound dips below 0.50
Permutation test p = 0.213 — not statistically significant
Exploratory finding — needs more crisis-period data to confirm
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
11
EXPLAINABILITY
Feature Importance via SHAP
Top Features — Set D
1. 5-day price change
2. 10-day price change
3. RSI
4. Lagged realized volatility
5. Credit-rating sentiment
★ Top Sentiment Feature
6. SMA-50
7. MACD
8. Fed policy sentiment
9. Trade war sentiment
10. Employment sentiment
Technical (Price/Vol)
Technical (Trend)
Sentiment / Topic
📊
Technical features dominate overall — but #5 sentiment feature ranks consistently ahead of SMA-50 and MACD.
💡
Topic-derived features hold 4 of top 10 positions, highlighting the value of context-aware sentiment analysis.
⚠️
During high-VIX periods, sentiment features gain substantial importance relative to standard technical indicators.
⏱️
Feature rankings shift over time (Kendall's τ = 0.29) — consistent with observed regime-dependence.
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
12
CONTRIBUTIONS & TAKEAWAYS
What This Thesis Contributes
Microscope
Controlled Ablation Design
Isolates what each layer of sentiment structure adds — no confounded comparisons
Checkmark
7-Check Evaluation Framework
Prevents shortcuts weakening many financial ML studies — walk-forward, bootstrap, holdout, statistical tests
Lightbulb
Key Insight
A single accuracy score can hide where a model works. A model useless on average may be most valuable during crises.
Regime-aware evaluation — testing separately across market conditions — should be STANDARD PRACTICE in financial ML research.
Where to Go Next
Adaptive models that automatically adjust feature weights based on current market conditions
Test on other assets and longer crisis periods to validate regime-dependence findings
Retrain topic model per evaluation fold to eliminate global-fitting information leakage
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
APPENDIX A
13
METHODOLOGY DETAIL
Walk-Forward Cross-Validation Explained
2008
2010
2012
2014
2016
2018
2019
2021
2023
Fold 1
Train
Test
Fold 2
Train
Test
Fold 3
Train
Test
Fold 4
Train
Test
Fold 5
Train
Test
Training Period
Test Period
Key Principle Icon
KEY PRINCIPLE: The model ALWAYS trains on past data and tests on future data — no look-ahead bias. Training window expands with each fold.
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
14
STUDY LIMITATIONS
APPENDIX B
Limitations
1
Multiple Comparisons
Many comparisons tested without formal multiple-testing correction
2
Single Asset
S&P 500 at daily frequency only — generalizability unknown
3
BERTopic Leakage
Topic model trained on full timeline — minor information leakage
4
Artifact Topics
2 of 7 topics are narrow artifacts with limited interpretability
5
No Transformer Baselines
No Transformer-based forecasting models tested as baselines
6
Small Crisis Sample
High-VIX subsample is only 50 days — underpowered
7
No Ground Truth
No human-annotated sentiment ground truth for validation
8
Concept Drift
Performance drifts over time — deployed model needs periodic retraining
May 2026
Honors College Thesis Symposium
Made byBobr AI
Logo
15
APPENDIX C
TECHNICAL DETAIL
End-to-End Data Pipeline
Text Processing
Headlines
FinBERT Scoring
BERTopic Clustering
Daily Aggregation
Quantitative Features
OHLCV + VIX Data
Technical Indicators
(SMA, RSI, MACD, Vol)
Dataset Construction
Assemble Feature Sets
A, B, C, D
A
B
C
D
Model Evaluation
Walk-Forward
5-Fold CV
Per-Fold Train/Predict
Regime Analysis + SHAP
May 2026
Honors College Thesis Symposium
Made byBobr AI
USM Logo
APPENDIX D
16
BIBLIOGRAPHY
Key References
1
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models
Araci, D. (2019)
2
BERTopic: Neural Topic Modeling with Class-based TF-IDF
Grootendorst, M. (2022)
3
A Unified Approach to Interpreting Model Predictions (SHAP)
Lundberg, S. & Lee, S.I. (2017)
4
XGBoost: A Scalable Tree Boosting System
Chen, T. & Guestrin, C. (2016)
5
LightGBM: A Highly Efficient Gradient Boosting Decision Tree
Ke, G. et al. (2017)
6
Deep Learning with LSTM Networks for Financial Market Predictions
Fischer, T. & Krauss, C. (2018)
7
Advances in Financial Machine Learning (evaluation standards)
Lopez de Prado, M. (2018)
8
Long Short-Term Memory (LSTM)
Hochreiter, S. & Schmidhuber, J. (1997)
Presented at the University of Southern Mississippi Honors College Thesis Symposium | May 2026
Advisor: Dr. Zhaoxian Zhou | School of Computing Sciences and Computer Engineering
Prabin Bajgai
University of Southern Mississippi
Made byBobr AI
Bobr AI

DESIGNER-MADE
PRESENTATION,
GENERATED FROM
YOUR PROMPT

Create your own professional slide deck with real images, data charts, and unique design in under a minute.

Generate For Free

Hybrid Models for S&P 500 Forecasting & News Sentiment

Explore how integrating FinBERT sentiment and BERTopic modeling improves S&P 500 forecasting during market crises in this financial machine learning study.

Honors Thesis Symposium | May 2026

THESIS PRESENTATION

Advancing Financial Time Series Forecasting: A Comparative Analysis of Hybrid Models Integrating Topic-Augmented Sentiment and Explainable AI

Prabin Bajgai

Honors College | The University of Southern Mississippi

Advisor: Dr. Zhaoxian Zhou

School of Computing Sciences and Computer Engineering

May 2026

Honors College Thesis Symposium

02

THE QUESTION

Can News Headlines Predict the Market?

Can structured news sentiment improve next-day predictions of S&P 500 direction (up or down)?

Headlines move markets in minutes — but existing evidence is mixed

Most studies report one overall accuracy number that hides where a model actually works

Problem: studies change the sentiment method AND the model simultaneously — impossible to isolate what helped

This thesis:

hold the model constant

, vary inputs one layer at a time, test across different market conditions

Prabin Bajgai | University of Southern Mississippi

03

DATA OVERVIEW

16 Years of Markets and Headlines

Market Data

S&P 500 daily prices (2008–2023)

→ 3,445 trading days

1-minute intraday bars for measuring market turbulence

VIX — the "fear gauge" — classifying calm vs. stressed markets

News Data

82,110 financial headlines from 3 sources (after cleaning)

Filtered to broad-market news only: Fed, inflation, GDP, jobs, index-level events

Individual stock headlines excluded

Coverage: 1–2 headlines/day early on, rising to 19/day by 2023

May 2026

Honors College Thesis Symposium

04

SENTIMENT SCORING

Reading the News with FinBERT

<strong>FinBERT:</strong> a language model trained specifically on financial text

Reads full sentences in context — not just individual words

Labels each headline: <span style="color: #48BB78; font-weight: 600;">Positive (+1)</span> | <span style="color: #A0AEC0; font-weight: 600;">Neutral (0)</span> | <span style="color: #F56565; font-weight: 600;">Negative (−1)</span>

<strong>Why not simpler methods?</strong> Word-counting misses context — 'not good' scores as positive. General models trained on movie reviews misread financial language.

<strong>Limitation:</strong> no human-annotated ground truth for these specific headlines

Validation Results

82,110

headlines validated

Mean Confidence

0.815

81.5%

Early period

0.809

Later period

0.817

Stable across 16 years

[POSITIVE]

[NEUTRAL]

[NEGATIVE]

May 2026

Honors College Thesis Symposium

TOPIC MODELING

What Is the News About? BERTopic

Knowing a headline is 'negative' is not enough. Negative about the Fed vs. negative about jobs has very different market implications.

May 2026

Honors College Thesis Symposium

FEATURE SETS

Adding One Ingredient at a Time

May 2026

Honors College Thesis Symposium

METHODOLOGY

Models and How We Test Them

May 2026

Honors College Thesis Symposium

On Average, Nothing Beats a Coin Flip

All results hover near <span style='color:#FC8181; font-weight:700;'>0.50</span> —<br>random guessing territory

DeLong <i>p</i> > 0.35 for ALL pairs —<br><span style='color:#90CDF4; font-weight:700;'>no statistically significant</span> differences

Bootstrap 95% CIs<br>all <span style='color:#E2E8F0; font-weight:700;'>straddle 0.50</span>

But averages can hide the real story...

May 2026

Honors College Thesis Symposium

09

KEY FINDING

The Regime Analysis — Where Sentiment Actually Helps

🏆 Set D is ONLY feature set above 0.50 during crises (0.568)

📉 Sets A & B collapse in high-VIX periods (0.430 / 0.407)

💡 Set D worst in calm markets (0.477) — sentiment adds noise when markets are quiet

Volatility-weighted sentiment helps in stress, hurts in calm

May 2026

Honors College Thesis Symposium

10

CONVERGENT EVIDENCE

The Same Pattern Appears Everywhere

Consistent Findings

Rolling AUC: Set D outperforms Set A around 2011 debt crisis, 2015–16 volatility, 2020 COVID crash, 2022 bear market

Temporal holdout: Set D peaks in volatile 2022 (AUC 0.540), drops in calm 2021 (AUC 0.468)

Simple VIX trading rules all lose money (Sharpe −1.2 to −2.8)

The ML model captures more than just "volatility is high"

⚠ Honest Limitations of This Finding

High-VIX subsample is only 50 days (small sample)

Bootstrap 95% CI: [0.495, 0.714] — lower bound dips below 0.50

Permutation test p = 0.213 — not statistically significant

Exploratory finding — needs more crisis-period data to confirm

May 2026

Honors College Thesis Symposium

11

EXPLAINABILITY

Feature Importance via SHAP

May 2026

Honors College Thesis Symposium

12

CONTRIBUTIONS & TAKEAWAYS

What This Thesis Contributes

Controlled Ablation Design

Isolates what each layer of sentiment structure adds — no confounded comparisons

7-Check Evaluation Framework

Prevents shortcuts weakening many financial ML studies — walk-forward, bootstrap, holdout, statistical tests

Key Insight

A single accuracy score can hide where a model works. A model useless on average may be most valuable during crises.

Regime-aware evaluation — testing separately across market conditions — should be STANDARD PRACTICE in financial ML research.

Where to Go Next

Adaptive models that automatically adjust feature weights based on current market conditions

Test on other assets and longer crisis periods to validate regime-dependence findings

Retrain topic model per evaluation fold to eliminate global-fitting information leakage

May 2026

Honors College Thesis Symposium

APPENDIX A

13

METHODOLOGY DETAIL

Walk-Forward Cross-Validation Explained

Fold 1

Fold 2

Fold 3

Fold 4

Fold 5

Train

Test

The model ALWAYS trains on past data and tests on future data — no look-ahead bias.

Training window expands with each fold.

May 2026

Honors College Thesis Symposium

14

STUDY LIMITATIONS

APPENDIX B

Limitations

Multiple Comparisons

Many comparisons tested without formal multiple-testing correction

Single Asset

S&P 500 at daily frequency only — generalizability unknown

BERTopic Leakage

Topic model trained on full timeline — minor information leakage

Artifact Topics

2 of 7 topics are narrow artifacts with limited interpretability

No Transformer Baselines

No Transformer-based forecasting models tested as baselines

Small Crisis Sample

High-VIX subsample is only 50 days — underpowered

No Ground Truth

No human-annotated sentiment ground truth for validation

Concept Drift

Performance drifts over time — deployed model needs periodic retraining

May 2026

Honors College Thesis Symposium

TECHNICAL DETAIL

End-to-End Data Pipeline

15

APPENDIX C

May 2026

Honors College Thesis Symposium

Text Processing

Quantitative Features

Dataset Construction

Model Evaluation

Presented at the University of Southern Mississippi Honors College Thesis Symposium | May 2026

Advisor: Dr. Zhaoxian Zhou | School of Computing Sciences and Computer Engineering

Prabin Bajgai

University of Southern Mississippi

  • financial-forecasting
  • machine-learning
  • sentiment-analysis
  • topic-modeling
  • nlp
  • sp500
  • ai