# Amazon Review Rating Prediction with NLP and Machine Learning
> Discover how to predict Amazon star ratings using NLP models like XGBoost, Random Forest, and TF-IDF in this comprehensive machine learning case study.

Tags: nlp, machine-learning, xgboost, sentiment-analysis, python, streamlit, data-science
## Amazon Review Rating Prediction using NLP
*   **Goal:** Predict star ratings (1–5) from unstructured customer review text.
*   **Tech Stack:** Streamlit, Pandas, Scikit-Learn (TF-IDF), XGBoost, Random Forest, and Logistic Regression.

## Project Architecture & Pipeline
*   **Preprocessing:** Data cleaning using Regex and TF-IDF vectorization with 800 features.
*   **Workflow:** Data loading (JSON Lines) -> Feature Extraction -> 80/20 Train-Test Split -> Multi-Model Training -> Evaluation.

## Model Implementation & Performance
*   **Logistic Regression:** Baseline model, fast and interpretable.
*   **Random Forest:** Ensemble method capturing non-linear patterns.
*   **XGBoost:** Top performer with a Macro ROC-AUC of approximately 0.82.
*   **Metrics:** Accuracy and ROC-AUC were used to evaluate performance, accounting for class imbalance in review data.

## Live Prediction & Deployment
*   **Streamlit App:** Features real-time inference, data visualization, and dynamic phone model selection based on review content.
*   **Key Finding:** TF-IDF effectively captures sentiment in short reviews, with XGBoost providing the best predictive reliability.

## Conclusion & Future Roadmap
*   **Results:** Successfully moved from notebook to a deployed Streamlit application.
*   **Future Scope:** Implementation of Transformer models (BERT/RoBERTa) and aspect-based sentiment analysis for deeper insights.
---
This presentation was created with [Bobr AI](https://bobr.ai) — an AI presentation generator.