Made byBobr AI

IPL Match Winner Prediction using Orange Data Mining Tool

Learn how to predict IPL match winners using machine learning, Orange Data Mining, and Logistic Regression with historical data from seasons 2020-2025.

#ipl-prediction#machine-learning#data-mining#orange-tool#sports-analytics#python-project#logistic-regression#college-project
Watch
Pitch
Data Mining Project

IPL Match
Winner Prediction

Using Machine Learning & Orange Data Mining Tool

IPL Seasons 2020–2025
B.Sc. / BCA College Project | May 2026
Made byBobr AI
02
OBJECTIVE

Objective of the Project

To predict the probable winner of IPL matches using machine learning
To analyze IPL match data from IPL seasons 2020–2025
To compare different machine learning algorithms (Decision Tree, Random Forest, Logistic Regression)
To understand data preprocessing and prediction using Orange Data Mining Tool
Made byBobr AI
Why This Project
03

Why This Project Was Chosen

IPL is one of the most popular cricket leagues in the world

Match outcomes depend on multiple factors: Teams, Toss, Venue, City

Machine learning can identify patterns from historical IPL data

Helps understand practical applications of Data Mining

Real-world sports analytics use case for academic learning

Made byBobr AI
04
TOOLS

Software & Tools Used

Orange Data Mining Tool

Main tool used for building the ML workflow

Microsoft Excel

Used for editing and cleaning the CSV dataset

Kaggle

Source for downloading IPL historical datasets

Made byBobr AI
05
Datasets

Datasets Explored

matches.csv
Match-level information
Used
deliveries.csv
Ball-by-ball IPL data
Removed (too complex)
IPL.csv
Detailed IPL info, had many unnecessary columns
Removed
Custom Dataset (Final)
Simplified match-level dataset
Final Choice
Made byBobr AI
FINAL DATASET
06

Final Dataset – Columns Used

Dataset Columns

Team1
Team2
Toss_Winner
Toss_Decision
Stadium
City
Season
Match_Winner (Target)

Why This Dataset?

One row = one match (clean structure)
Removed ball-by-ball complexity
Easier for ML models to process
Improved prediction stability
Years covered: IPL 2020 to 2025
Source: Kaggle IPL Datasets & IPL Historical Records
Made byBobr AI
07
CHALLENGES

Problems Faced & Solutions

Problem: Dataset had 65+ complex columns (overs, balls, batter, bowler)
Solution: Reduced to match-level columns only
Problem: Model Cross (X) Error — Target column not selected
Solution: Moved Match_Winner to Target in Select Columns widget
Problem: Wrong Predictions / Overfitting due to post-match columns
Solution: Removed player_of_match and post-match data
Problem: ROC Analysis Error — "Cannot get model output"
Solution: Connected Test & Score → ROC Analysis correctly; used Confusion Matrix instead
Made byBobr AI
Preprocessing
08

Data Preprocessing Steps

Step 1

Load Dataset

Widget: File
Method: Import CSV
Step 2

Select Columns

Features: Team1, Team2, Toss_Winner, Toss_Decision, Stadium, City, Season
Target: Match_Winner
Step 3

Handle Missing Values

Widget: Impute
Method: Average / Most Frequent
Step 4

Encode Text Data

Widget: Continuize
Method: One-Hot Encoding for categorical variables
B.Sc. / BCA College Project | May 2026
Made byBobr AI
ML MODELS
09

Machine Learning Models Used

Decision Tree

Splits data based on features
Simple, visual, interpretable
Accuracy: 41.9%

Random Forest

Ensemble of multiple decision trees
Reduces overfitting
Accuracy: 47.5%
Top Performer

Logistic Regression

Best for categorical encoded data
Strong generalization
Accuracy: 61.4% ← BEST
All 3 models were tested using Test & Score widget in Orange Data Mining Tool
Made byBobr AI
RESULTS
10

Results & Model Comparison

Decision Tree
41.9%
Random Forest
47.5%
Logistic Regression
61.4%
BEST
Model Accuracy
Decision Tree 41.9%
Random Forest 47.5%
Logistic Regression 61.4%
Logistic Regression selected as the final model
Made byBobr AI
11
WORKFLOW

Final Prediction Workflow in Orange

File
Load CSV dataset
Select Columns
Features + Target
Impute
Handle Missing Values
Continuize
One-Hot Encoding
Logistic Regression
Model Training
Test & Score
Evaluate Performance
Predictions
Predict Winner
Confusion Matrix
Visualize Results
Made byBobr AI
12
PREDICTION

How Prediction Was Done

1
Add a new IPL match row in the CSV file
2
Enter pre-match details: Team1, Team2, Toss Winner, Toss Decision, Stadium, City, Season
3
Leave the Match_Winner column empty (this is what we predict)
4
Reload the dataset in Orange Data Mining Tool
5
The Predictions widget predicts the probable match winner
Example Input
CSK vs MI | Toss: CSK | Decision: Bat | Venue: Wankhede | City: Mumbai | Season: 2024
Predicted Winner: Mumbai Indians
Made byBobr AI
EVALUATION
13

Model Evaluation & Confusion Matrix

Confusion Matrix

Predicted: Win
Predicted: Loss
Actual: Win
True Positive (TP) ✓
False Negative (FN) ✗
Actual: Loss
False Positive (FP) ✗
True Negative (TN) ✓
Diagonal = Correct Predictions

Evaluation Metrics Used

Accuracy
Overall correct predictions
AUC
Area Under Curve
Precision
Correct positive predictions
Recall
Actual positives caught
F1 Score
Balance of precision & recall
Widget used in Orange
Test & Score
Made byBobr AI
Future Scope
14

Future Improvements

Player Performance Analysis
Include individual player stats in prediction
Team Strength Calculation
Build composite team strength scores
Recent Form Analysis
Weight recent match results more heavily
Head-to-Head Records
Factor in historical matchup data between teams
Player Availability & Injuries
Highest Priority
Real-time squad changes and injury news
These features can significantly improve prediction accuracy beyond 61.4%
Made byBobr AI
15
LEARNINGS

Learning Outcomes

Data Preprocessing
Data Cleaning
Handling Missing Values
Feature Selection
One-Hot Encoding
ML Model Comparison
Prediction with Orange Tool
Model Evaluation Techniques
Sports Analytics Application

This project provided hands-on experience with an end-to-end machine learning pipeline.

Made byBobr AI
16
VIVA Q&A

Possible Viva Questions

Q
Why was Logistic Regression selected?
A: It achieved the highest accuracy (61.4%) among all tested models.
Q
Why was ball-by-ball data removed?
A: It increased dataset complexity and caused overfitting in predictions.
Q
What is One-Hot Encoding?
A: A technique to convert categorical text values (like team names) into numerical form for ML models.
Q
What is a Confusion Matrix?
A: A tool to compare actual vs predicted values and measure model accuracy.
Q
Why is preprocessing important?
A: It improves data quality, handles missing values, and enhances prediction performance.
Made byBobr AI
20
CONCLUSION

Conclusion

IPL winner prediction successfully built using Orange Data Mining Tool
Three ML models compared: Decision Tree, Random Forest, Logistic Regression
Logistic Regression achieved best accuracy of 61.4%
Practical application of data mining in sports analytics demonstrated
Future improvements can further enhance prediction accuracy

Thank You!

Open for Questions & Discussion

IPL Match Winner Prediction | Orange Data Mining Tool | 2026
Made byBobr AI
Bobr AI

DESIGNER-MADE
PRESENTATION,
GENERATED FROM
YOUR PROMPT

Create your own professional slide deck with real images, data charts, and unique design in under a minute.

Generate For Free

IPL Match Winner Prediction using Orange Data Mining Tool

Learn how to predict IPL match winners using machine learning, Orange Data Mining, and Logistic Regression with historical data from seasons 2020-2025.

Data Mining Project

IPL Match

Winner Prediction

Using Machine Learning & Orange Data Mining Tool

IPL Seasons 2020–2025

B.Sc. / BCA College Project | May 2026

02

OBJECTIVE

Objective of the Project

To predict the probable winner of IPL matches using machine learning

To analyze IPL match data from IPL seasons 2020–2025

To compare different machine learning algorithms (Decision Tree, Random Forest, Logistic Regression)

To understand data preprocessing and prediction using Orange Data Mining Tool

Why This Project

Why This Project Was Chosen

IPL is one of the most popular cricket leagues in the world

Match outcomes depend on multiple factors: Teams, Toss, Venue, City

Machine learning can identify patterns from historical IPL data

Helps understand practical applications of Data Mining

Real-world sports analytics use case for academic learning

Software & Tools Used

Orange Data Mining Tool

Main tool used for building the ML workflow

Microsoft Excel

Used for editing and cleaning the CSV dataset

Kaggle

Source for downloading IPL historical datasets

06

FINAL DATASET

Final Dataset – Columns Used

Team1

Team2

Toss_Winner

Toss_Decision

Stadium

City

Season

Match_Winner

One row = one match (clean structure)

Removed ball-by-ball complexity

Easier for ML models to process

Improved prediction stability

Years covered: IPL 2020 to 2025

Source: Kaggle IPL Datasets & IPL Historical Records

07

CHALLENGES

Problems Faced & Solutions

Dataset had 65+ complex columns (overs, balls, batter, bowler)

Reduced to match-level columns only

Model Cross (X) Error — Target column not selected

Moved Match_Winner to Target in Select Columns widget

Wrong Predictions / Overfitting due to post-match columns

Removed player_of_match and post-match data

ROC Analysis Error — "Cannot get model output"

Connected Test & Score → ROC Analysis correctly; used Confusion Matrix instead

Data Preprocessing Steps

B.Sc. / BCA College Project | May 2026

09

ML MODELS

Machine Learning Models Used

Decision Tree

Splits data based on features

Simple, visual, interpretable

Accuracy: 41.9%

Random Forest

Ensemble of multiple decision trees

Reduces overfitting

Accuracy: 47.5%

Logistic Regression

Best for categorical encoded data

Strong generalization

Accuracy: 61.4% ← BEST

All 3 models were tested using Test & Score widget in Orange Data Mining Tool

10

RESULTS

Results & Model Comparison

Decision Tree

41.9%

Random Forest

47.5%

Logistic Regression

61.4%

60%

68%

88%

Logistic Regression selected as the final model

11

WORKFLOW

Final Prediction Workflow in Orange

File

Load CSV dataset

Select Columns

Features + Target

Impute

Handle Missing Values

Continuize

One-Hot Encoding

Logistic Regression

Model Training

Test & Score

Evaluate Performance

Predictions

Predict Winner

Confusion Matrix

Visualize Results

12

PREDICTION

How Prediction Was Done

Add a new IPL match row in the CSV file

Enter pre-match details: Team1, Team2, Toss Winner, Toss Decision, Stadium, City, Season

Leave the Match_Winner column empty

(this is what we predict)

Reload the dataset in Orange Data Mining Tool

The Predictions widget predicts the probable match winner

CSK vs MI | Toss: CSK | Decision: Bat | Venue: Wankhede | City: Mumbai | Season: 2024

Predicted Winner: Mumbai Indians

EVALUATION

13

Model Evaluation & Confusion Matrix

Confusion Matrix

Predicted: Win

Predicted: Loss

Actual: Win

True Positive (TP) ✓

False Negative (FN) ✗

Actual: Loss

False Positive (FP) ✗

True Negative (TN) ✓

Diagonal = Correct Predictions

Evaluation Metrics Used

Accuracy

Overall correct predictions

AUC

Area Under Curve

Precision

Correct positive predictions

Recall

Actual positives caught

F1 Score

Balance of precision & recall

Widget used in Orange

Test & Score

14

Future Scope

Future Improvements

Player Performance Analysis

Include individual player stats in prediction

Team Strength Calculation

Build composite team strength scores

Recent Form Analysis

Weight recent match results more heavily

Head-to-Head Records

Factor in historical matchup data between teams

Player Availability & Injuries

Real-time squad changes and injury news

These features can significantly improve prediction accuracy beyond 61.4%

15

LEARNINGS

Learning Outcomes

Data Preprocessing

Data Cleaning

Handling Missing Values

Feature Selection

One-Hot Encoding

ML Model Comparison

Prediction with Orange Tool

Model Evaluation Techniques

Sports Analytics Application

This project provided hands-on experience with an end-to-end machine learning pipeline.

16

VIVA Q&A

Possible Viva Questions

Why was Logistic Regression selected?

It achieved the highest accuracy (61.4%) among all tested models.

Why was ball-by-ball data removed?

It increased dataset complexity and caused overfitting in predictions.

What is One-Hot Encoding?

A technique to convert categorical text values (like team names) into numerical form for ML models.

What is a Confusion Matrix?

A tool to compare actual vs predicted values and measure model accuracy.

Why is preprocessing important?

It improves data quality, handles missing values, and enhances prediction performance.

CONCLUSION

Conclusion

IPL winner prediction successfully built using Orange Data Mining Tool

Three ML models compared: Decision Tree, Random Forest, Logistic Regression

Logistic Regression achieved best accuracy of 61.4%

Practical application of data mining in sports analytics demonstrated

Future improvements can further enhance prediction accuracy

Thank You!

Open for Questions & Discussion

IPL Match Winner Prediction | Orange Data Mining Tool | 2026