IPL Match Winner Prediction using Orange Data Mining Tool
Learn how to predict IPL match winners using machine learning, Orange Data Mining, and Logistic Regression with historical data from seasons 2020-2025.
Data Mining Project
IPL Match
Winner Prediction
Using Machine Learning & Orange Data Mining Tool
IPL Seasons 2020–2025
B.Sc. / BCA College Project | May 2026
02
OBJECTIVE
Objective of the Project
To predict the probable winner of IPL matches using machine learning
To analyze IPL match data from IPL seasons 2020–2025
To compare different machine learning algorithms (Decision Tree, Random Forest, Logistic Regression)
To understand data preprocessing and prediction using Orange Data Mining Tool
Why This Project
Why This Project Was Chosen
IPL is one of the most popular cricket leagues in the world
Match outcomes depend on multiple factors: Teams, Toss, Venue, City
Machine learning can identify patterns from historical IPL data
Helps understand practical applications of Data Mining
Real-world sports analytics use case for academic learning
Software & Tools Used
Orange Data Mining Tool
Main tool used for building the ML workflow
Microsoft Excel
Used for editing and cleaning the CSV dataset
Kaggle
Source for downloading IPL historical datasets
06
FINAL DATASET
Final Dataset – Columns Used
Team1
Team2
Toss_Winner
Toss_Decision
Stadium
City
Season
Match_Winner
One row = one match (clean structure)
Removed ball-by-ball complexity
Easier for ML models to process
Improved prediction stability
Years covered: IPL 2020 to 2025
Source: Kaggle IPL Datasets & IPL Historical Records
07
CHALLENGES
Problems Faced & Solutions
Dataset had 65+ complex columns (overs, balls, batter, bowler)
Reduced to match-level columns only
Model Cross (X) Error — Target column not selected
Moved Match_Winner to Target in Select Columns widget
Wrong Predictions / Overfitting due to post-match columns
Removed player_of_match and post-match data
ROC Analysis Error — "Cannot get model output"
Connected Test & Score → ROC Analysis correctly; used Confusion Matrix instead
Data Preprocessing Steps
B.Sc. / BCA College Project | May 2026
09
ML MODELS
Machine Learning Models Used
Decision Tree
Splits data based on features
Simple, visual, interpretable
Accuracy: 41.9%
Random Forest
Ensemble of multiple decision trees
Reduces overfitting
Accuracy: 47.5%
Logistic Regression
Best for categorical encoded data
Strong generalization
Accuracy: 61.4% ← BEST
All 3 models were tested using Test & Score widget in Orange Data Mining Tool
10
RESULTS
Results & Model Comparison
Decision Tree
41.9%
Random Forest
47.5%
Logistic Regression
61.4%
60%
68%
88%
Logistic Regression selected as the final model
11
WORKFLOW
Final Prediction Workflow in Orange
File
Load CSV dataset
Select Columns
Features + Target
Impute
Handle Missing Values
Continuize
One-Hot Encoding
Logistic Regression
Model Training
Test & Score
Evaluate Performance
Predictions
Predict Winner
Confusion Matrix
Visualize Results
12
PREDICTION
How Prediction Was Done
Add a new IPL match row in the CSV file
Enter pre-match details: Team1, Team2, Toss Winner, Toss Decision, Stadium, City, Season
Leave the Match_Winner column empty
(this is what we predict)
Reload the dataset in Orange Data Mining Tool
The Predictions widget predicts the probable match winner
CSK vs MI | Toss: CSK | Decision: Bat | Venue: Wankhede | City: Mumbai | Season: 2024
Predicted Winner: Mumbai Indians
EVALUATION
13
Model Evaluation & Confusion Matrix
Confusion Matrix
Predicted: Win
Predicted: Loss
Actual: Win
True Positive (TP) ✓
False Negative (FN) ✗
Actual: Loss
False Positive (FP) ✗
True Negative (TN) ✓
Diagonal = Correct Predictions
Evaluation Metrics Used
Accuracy
Overall correct predictions
AUC
Area Under Curve
Precision
Correct positive predictions
Recall
Actual positives caught
F1 Score
Balance of precision & recall
Widget used in Orange
Test & Score
14
Future Scope
Future Improvements
Player Performance Analysis
Include individual player stats in prediction
Team Strength Calculation
Build composite team strength scores
Recent Form Analysis
Weight recent match results more heavily
Head-to-Head Records
Factor in historical matchup data between teams
Player Availability & Injuries
Real-time squad changes and injury news
These features can significantly improve prediction accuracy beyond 61.4%
15
LEARNINGS
Learning Outcomes
Data Preprocessing
Data Cleaning
Handling Missing Values
Feature Selection
One-Hot Encoding
ML Model Comparison
Prediction with Orange Tool
Model Evaluation Techniques
Sports Analytics Application
This project provided hands-on experience with an end-to-end machine learning pipeline.
16
VIVA Q&A
Possible Viva Questions
Why was Logistic Regression selected?
It achieved the highest accuracy (61.4%) among all tested models.
Why was ball-by-ball data removed?
It increased dataset complexity and caused overfitting in predictions.
What is One-Hot Encoding?
A technique to convert categorical text values (like team names) into numerical form for ML models.
What is a Confusion Matrix?
A tool to compare actual vs predicted values and measure model accuracy.
Why is preprocessing important?
It improves data quality, handles missing values, and enhances prediction performance.
CONCLUSION
Conclusion
IPL winner prediction successfully built using Orange Data Mining Tool
Three ML models compared: Decision Tree, Random Forest, Logistic Regression
Logistic Regression achieved best accuracy of 61.4%
Practical application of data mining in sports analytics demonstrated
Future improvements can further enhance prediction accuracy
Thank You!
Open for Questions & Discussion
IPL Match Winner Prediction | Orange Data Mining Tool | 2026
- ipl-prediction
- machine-learning
- data-mining
- orange-tool
- sports-analytics
- python-project
- logistic-regression
- college-project