Hybrid Models for S&P 500 Forecasting & News Sentiment

Name: Hybrid Models for S&P 500 Forecasting & News Sentiment
Uploaded: 2026-04-16T05:50:31.377Z
Description: Explore how integrating FinBERT sentiment and BERTopic modeling improves S&P 500 forecasting during market crises in this financial machine learning study.

Explore how integrating FinBERT sentiment and BERTopic modeling improves S&P 500 forecasting during market crises in this financial machine learning study.

#financial-forecasting#machine-learning#sentiment-analysis#topic-modeling#nlp#sp500#ai

Watch
Pitch

Honors Thesis Symposium | May 2026

THESIS PRESENTATION

Advancing Financial Time Series Forecasting: A Comparative Analysis of Hybrid Models Integrating Topic-Augmented Sentiment and Explainable AI

Prabin Bajgai

Honors College | The University of Southern Mississippi

Advisor: Dr. Zhaoxian Zhou

School of Computing Sciences and Computer Engineering

May 2026

Honors College Thesis Symposium

Made by

THE QUESTION

Can News Headlines Predict the Market?

Can structured news sentiment improve next-day predictions of S&P 500 direction (up or down)?

Headlines move markets in minutes — but existing evidence is mixed

Most studies report one overall accuracy number that hides where a model actually works

Problem: studies change the sentiment method AND the model simultaneously — impossible to isolate what helped

This thesis: hold the model constant, vary inputs one layer at a time, test across different market conditions

Prabin Bajgai | University of Southern Mississippi

Made by

DATA OVERVIEW

16 Years of Markets and Headlines

Market Data

📈

S&P 500 daily prices (2008–2023)

→ 3,445 trading days

📊

1-minute intraday bars for measuring market turbulence

😨

VIX — the "fear gauge" — classifying calm vs. stressed markets

News Data

📰

82,110 financial headlines from 3 sources (after cleaning)

🔍

Filtered to broad-market news only: Fed, inflation, GDP, jobs, index-level events

❌

Individual stock headlines excluded

📅

Coverage: 1–2 headlines/day early on, rising to 19/day by 2023

2008

Financial Crisis

2012

European Debt

2016

Brexit / Elections

2020

Pandemic Crash

2023

Inflation / Rates

May 2026

Honors College Thesis Symposium

Made by

SENTIMENT SCORING

Reading the News with FinBERT

❯

FinBERT: a language model trained specifically on financial text

❯

Reads full sentences in context — not just individual words

❯

Labels each headline: Positive (+1) | Neutral (0) | Negative (−1)

❯

Why not simpler methods? Word-counting misses context — 'not good' scores as positive. General models trained on movie reviews misread financial language.

❯

Limitation: no human-annotated ground truth for these specific headlines

Validation Results

82,110

headlines validated

Mean Confidence

0.815

Early period 0.809

Later period 0.817

* Stable across 16 years

[POSITIVE]

[NEUTRAL]

[NEGATIVE]

May 2026

Honors College Thesis Symposium

Made by

TOPIC MODELING

What Is the News About? BERTopic

"Knowing a headline is 'negative' is not enough. Negative about the Fed vs. negative about jobs has very different market implications."

Numerical Fingerprint

Represent each headline as a vector

→

Cluster

Group similar headlines together

→

Label

Name each cluster with distinctive keywords

✓

7 Topics Discovered

1.

General market movement — 85.6%
2.

Fed policy
3.

Employment
4.

Trade war
5.

Credit ratings
6–7.

Small artifact topics (<1% combined)

★

Topic Quality (NPMI Score)

BERTopic

0.344

LDA (classic)

0.039

37% of outlier headlines reassigned to nearest topic

May 2026

Honors College Thesis Symposium

Made by

FEATURE SETS

Adding One Ingredient at a Time

SET D

+ Volatility-Weighted Topic Sentiment

Same as C but amplified on turbulent days, dampened on calm days

59 features

▲ +

SET C

+ Topic-Structured Sentiment

Per-topic sentiment scores, topic probabilities, topic diversity

45 features

▲ +

SET B

+ Basic Sentiment

Lagged daily sentiment, sentiment moving averages, sentiment volatility

30 features

▲ +

SET A

Price Data Only

Moving averages, momentum, volatility — no text

23 features

Ablation Design — This controlled, layer-by-layer approach isolates each ingredient's contribution to forecasting performance.

May 2026

Honors College Thesis Symposium

Made by

METHODOLOGY

Models and How We Test Them

Models Tested

Gradient-Boosted Trees

LightGBM, XGBoost, CatBoost

"Hundreds of small decision trees, each correcting the last"

Recurrent Neural Networks

LSTM, GRU

"Process data as ordered sequences with memory of previous days"

Baselines

Always-predict-up

Logistic regression

Random forest

7-Check Evaluation Framework

Walk-forward cross-validation: train past → test future (5 folds)

Timing robustness: shift sentiment +1 day (leakage check)

Nested CV: prevent tuning from inflating results

Statistical tests: DeLong & Diebold-Mariano

Rolling 1-year AUC: track when model works vs. fails

Block bootstrap: confidence intervals preserving time structure

Temporal holdout: freeze after 2020, predict 2021–2023

May 2026

Honors College Thesis Symposium

Made by

AGGREGATE RESULTS

On Average, Nothing Beats a Coin Flip

Model	Set A	Set B	Set C	Set D
0.50 = Random Guessing
LightGBM	0.483	0.494	0.479	0.483
CatBoost	0.489	0.493	0.480	0.494
XGBoost	0.484	0.489	0.489	0.491

All results hover near 0.50 —
random guessing territory

DeLong p > 0.35 for ALL pairs —
no statistically significant differences

Bootstrap 95% CIs
all straddle 0.50

But averages can hide the real story...

May 2026

Honors College Thesis Symposium

Made by

KEY FINDING

The Regime Analysis — Where Sentiment Actually Helps

Dataset	Overall AUC	Low VIX (<20)	Medium VIX (20–30)	High VIX (≥30)
Set A	0.485	0.489	0.507	0.430
Set B	0.510	0.509	0.539	0.407
Set C	0.513	0.487	0.554	0.517
Set D	0.512	0.477	0.561	⭐ 0.568

🏆 Set D is ONLY feature set above 0.50 during crises (0.568)

📉 Sets A & B collapse in high-VIX periods (0.430 / 0.407)

💡 Set D worst in calm markets (0.477) — sentiment adds noise when markets are quiet

Volatility-weighted sentiment helps in stress, hurts in calm

May 2026

Honors College Thesis Symposium

Made by

CONVERGENT EVIDENCE

The Same Pattern Appears Everywhere

Consistent Findings

Rolling AUC: Set D outperforms Set A around 2011 debt crisis, 2015–16 volatility, 2020 COVID crash, 2022 bear market

Temporal holdout: Set D peaks in volatile 2022 (AUC 0.540), drops in calm 2021 (AUC 0.468)

Simple VIX trading rules all lose money (Sharpe −1.2 to −2.8)

The ML model captures more than just "volatility is high"

⚠ Honest Limitations of This Finding

High-VIX subsample is only 50 days (small sample)

Bootstrap 95% CI: [0.495, 0.714] — lower bound dips below 0.50

Permutation test p = 0.213 — not statistically significant

Exploratory finding — needs more crisis-period data to confirm

May 2026

Honors College Thesis Symposium

Made by

EXPLAINABILITY

Feature Importance via SHAP

Top Features — Set D

1. 5-day price change

2. 10-day price change

3. RSI

4. Lagged realized volatility

5. Credit-rating sentiment

★ Top Sentiment Feature

6. SMA-50

7. MACD

8. Fed policy sentiment

9. Trade war sentiment

10. Employment sentiment

Technical (Price/Vol)

Technical (Trend)

Sentiment / Topic

📊

Technical features dominate overall — but #5 sentiment feature ranks consistently ahead of SMA-50 and MACD.

💡

Topic-derived features hold 4 of top 10 positions, highlighting the value of context-aware sentiment analysis.

⚠️

During high-VIX periods, sentiment features gain substantial importance relative to standard technical indicators.

⏱️

Feature rankings shift over time (Kendall's τ = 0.29) — consistent with observed regime-dependence.

May 2026

Honors College Thesis Symposium

Made by

CONTRIBUTIONS & TAKEAWAYS

What This Thesis Contributes

Controlled Ablation Design

Isolates what each layer of sentiment structure adds — no confounded comparisons

7-Check Evaluation Framework

Prevents shortcuts weakening many financial ML studies — walk-forward, bootstrap, holdout, statistical tests

Key Insight

A single accuracy score can hide where a model works. A model useless on average may be most valuable during crises.

Regime-aware evaluation — testing separately across market conditions — should be STANDARD PRACTICE in financial ML research.

Where to Go Next

→ Adaptive models that automatically adjust feature weights based on current market conditions

→ Test on other assets and longer crisis periods to validate regime-dependence findings

→ Retrain topic model per evaluation fold to eliminate global-fitting information leakage

May 2026

Honors College Thesis Symposium

Made by

APPENDIX A

METHODOLOGY DETAIL

Walk-Forward Cross-Validation Explained

2008

2010

2012

2014

2016

2018

2019

2021

2023

Fold 1

Train

Test

Fold 2

Train

Test

Fold 3

Train

Test

Fold 4

Train

Test

Fold 5

Train

Test

Training Period

Test Period

KEY PRINCIPLE: The model ALWAYS trains on past data and tests on future data — no look-ahead bias. Training window expands with each fold.

May 2026

Honors College Thesis Symposium

Made by

STUDY LIMITATIONS

APPENDIX B

Limitations

Multiple Comparisons

Many comparisons tested without formal multiple-testing correction

Single Asset

S&P 500 at daily frequency only — generalizability unknown

BERTopic Leakage

Topic model trained on full timeline — minor information leakage

Artifact Topics

2 of 7 topics are narrow artifacts with limited interpretability

No Transformer Baselines

No Transformer-based forecasting models tested as baselines

Small Crisis Sample

High-VIX subsample is only 50 days — underpowered

No Ground Truth

No human-annotated sentiment ground truth for validation

Concept Drift

Performance drifts over time — deployed model needs periodic retraining

May 2026

Honors College Thesis Symposium

Made by

APPENDIX C

TECHNICAL DETAIL

End-to-End Data Pipeline

Text Processing

Headlines

FinBERT Scoring

BERTopic Clustering

Daily Aggregation

Quantitative Features

OHLCV + VIX Data

Technical Indicators
(SMA, RSI, MACD, Vol)

Dataset Construction

Assemble Feature Sets
A, B, C, D

Model Evaluation

Walk-Forward
5-Fold CV

Per-Fold Train/Predict

Regime Analysis + SHAP

May 2026

Honors College Thesis Symposium

Made by

APPENDIX D

BIBLIOGRAPHY

Key References

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

Araci, D. (2019)

BERTopic: Neural Topic Modeling with Class-based TF-IDF

Grootendorst, M. (2022)

A Unified Approach to Interpreting Model Predictions (SHAP)

Lundberg, S. & Lee, S.I. (2017)

XGBoost: A Scalable Tree Boosting System

Chen, T. & Guestrin, C. (2016)

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

Ke, G. et al. (2017)

Deep Learning with LSTM Networks for Financial Market Predictions

Fischer, T. & Krauss, C. (2018)

Advances in Financial Machine Learning (evaluation standards)

Lopez de Prado, M. (2018)

Long Short-Term Memory (LSTM)

Hochreiter, S. & Schmidhuber, J. (1997)

Presented at the University of Southern Mississippi Honors College Thesis Symposium | May 2026

Advisor: Dr. Zhaoxian Zhou | School of Computing Sciences and Computer Engineering

Prabin Bajgai

University of Southern Mississippi

Made by

DESIGNER-MADE
PRESENTATION,
GENERATED FROM
YOUR PROMPT

Create your own professional slide deck with real images, data charts, and unique design in under a minute.

Generate For Free

Hybrid Models for S&P 500 Forecasting & News Sentiment

Explore how integrating FinBERT sentiment and BERTopic modeling improves S&P 500 forecasting during market crises in this financial machine learning study.

Honors Thesis Symposium | May 2026

THESIS PRESENTATION

Advancing Financial Time Series Forecasting: A Comparative Analysis of Hybrid Models Integrating Topic-Augmented Sentiment and Explainable AI

Prabin Bajgai

Honors College | The University of Southern Mississippi

Advisor: Dr. Zhaoxian Zhou

School of Computing Sciences and Computer Engineering

May 2026

Honors College Thesis Symposium

THE QUESTION

Can News Headlines Predict the Market?

Can structured news sentiment improve next-day predictions of S&P 500 direction (up or down)?

Headlines move markets in minutes — but existing evidence is mixed

Most studies report one overall accuracy number that hides where a model actually works

Problem: studies change the sentiment method AND the model simultaneously — impossible to isolate what helped

This thesis:

hold the model constant

, vary inputs one layer at a time, test across different market conditions

Prabin Bajgai | University of Southern Mississippi

DATA OVERVIEW

16 Years of Markets and Headlines

Market Data

S&P 500 daily prices (2008–2023)

→ 3,445 trading days

1-minute intraday bars for measuring market turbulence

VIX — the "fear gauge" — classifying calm vs. stressed markets

News Data

82,110 financial headlines from 3 sources (after cleaning)

Filtered to broad-market news only: Fed, inflation, GDP, jobs, index-level events

Individual stock headlines excluded

Coverage: 1–2 headlines/day early on, rising to 19/day by 2023

May 2026

Honors College Thesis Symposium

SENTIMENT SCORING

Reading the News with FinBERT

FinBERT: a language model trained specifically on financial text

Reads full sentences in context — not just individual words

Labels each headline: Positive (+1) | Neutral (0) | Negative (−1)

Why not simpler methods? Word-counting misses context — 'not good' scores as positive. General models trained on movie reviews misread financial language.

Limitation: no human-annotated ground truth for these specific headlines

Validation Results

82,110

headlines validated

Mean Confidence

0.815

81.5%

Early period

0.809

Later period

0.817

Stable across 16 years

[POSITIVE]

[NEUTRAL]

[NEGATIVE]

May 2026

Honors College Thesis Symposium

TOPIC MODELING

What Is the News About? BERTopic

Knowing a headline is 'negative' is not enough. Negative about the Fed vs. negative about jobs has very different market implications.

May 2026

Honors College Thesis Symposium

FEATURE SETS

Adding One Ingredient at a Time

May 2026

Honors College Thesis Symposium

METHODOLOGY

Models and How We Test Them

May 2026

Honors College Thesis Symposium

On Average, Nothing Beats a Coin Flip

All results hover near 0.50 — random guessing territory

DeLong p > 0.35 for ALL pairs — no statistically significant differences

Bootstrap 95% CIs all straddle 0.50

But averages can hide the real story...

May 2026

Honors College Thesis Symposium

KEY FINDING

The Regime Analysis — Where Sentiment Actually Helps

🏆 Set D is ONLY feature set above 0.50 during crises (0.568)

📉 Sets A & B collapse in high-VIX periods (0.430 / 0.407)

💡 Set D worst in calm markets (0.477) — sentiment adds noise when markets are quiet

Volatility-weighted sentiment helps in stress, hurts in calm

May 2026

Honors College Thesis Symposium

CONVERGENT EVIDENCE

The Same Pattern Appears Everywhere

Consistent Findings

Rolling AUC: Set D outperforms Set A around 2011 debt crisis, 2015–16 volatility, 2020 COVID crash, 2022 bear market

Temporal holdout: Set D peaks in volatile 2022 (AUC 0.540), drops in calm 2021 (AUC 0.468)

Simple VIX trading rules all lose money (Sharpe −1.2 to −2.8)

The ML model captures more than just "volatility is high"

⚠ Honest Limitations of This Finding

High-VIX subsample is only 50 days (small sample)

Bootstrap 95% CI: [0.495, 0.714] — lower bound dips below 0.50

Permutation test p = 0.213 — not statistically significant

Exploratory finding — needs more crisis-period data to confirm

May 2026

Honors College Thesis Symposium

EXPLAINABILITY

Feature Importance via SHAP

May 2026

Honors College Thesis Symposium

CONTRIBUTIONS & TAKEAWAYS

What This Thesis Contributes

Controlled Ablation Design

Isolates what each layer of sentiment structure adds — no confounded comparisons

7-Check Evaluation Framework

Prevents shortcuts weakening many financial ML studies — walk-forward, bootstrap, holdout, statistical tests

Key Insight

A single accuracy score can hide where a model works. A model useless on average may be most valuable during crises.

Regime-aware evaluation — testing separately across market conditions — should be STANDARD PRACTICE in financial ML research.

Where to Go Next

Adaptive models that automatically adjust feature weights based on current market conditions

Test on other assets and longer crisis periods to validate regime-dependence findings

Retrain topic model per evaluation fold to eliminate global-fitting information leakage

May 2026

Honors College Thesis Symposium

APPENDIX A

METHODOLOGY DETAIL

Walk-Forward Cross-Validation Explained

Fold 1

Fold 2

Fold 3

Fold 4

Fold 5

Train

Test

The model ALWAYS trains on past data and tests on future data — no look-ahead bias.

Training window expands with each fold.

May 2026

Honors College Thesis Symposium

STUDY LIMITATIONS

APPENDIX B

Limitations

Multiple Comparisons

Many comparisons tested without formal multiple-testing correction

Single Asset

S&P 500 at daily frequency only — generalizability unknown

BERTopic Leakage

Topic model trained on full timeline — minor information leakage

Artifact Topics

2 of 7 topics are narrow artifacts with limited interpretability

No Transformer Baselines

No Transformer-based forecasting models tested as baselines

Small Crisis Sample

High-VIX subsample is only 50 days — underpowered

No Ground Truth

No human-annotated sentiment ground truth for validation

Concept Drift

Performance drifts over time — deployed model needs periodic retraining

May 2026

Honors College Thesis Symposium

TECHNICAL DETAIL

End-to-End Data Pipeline

APPENDIX C

May 2026

Honors College Thesis Symposium

Text Processing

Quantitative Features

Dataset Construction

Model Evaluation

Presented at the University of Southern Mississippi Honors College Thesis Symposium | May 2026

Advisor: Dr. Zhaoxian Zhou | School of Computing Sciences and Computer Engineering

Prabin Bajgai

University of Southern Mississippi

financial-forecasting
machine-learning
sentiment-analysis
topic-modeling
nlp
sp500
ai

DESIGNER-MADE PRESENTATION, GENERATED FROM YOUR PROMPT

Hybrid Models for S&P 500 Forecasting & News Sentiment

DESIGNER-MADE
PRESENTATION,
GENERATED FROM
YOUR PROMPT