IPL Match Winner Prediction using Orange Data Mining Tool

Learn how to predict IPL match winners using machine learning, Orange Data Mining, and Logistic Regression with historical data from seasons 2020-2025.

#ipl-prediction#machine-learning#data-mining#orange-tool#sports-analytics#python-project#logistic-regression#college-project

Watch
Pitch

01

Data Mining Project

IPL Match

Winner Prediction

Using Machine Learning & Orange Data Mining Tool

IPL Seasons 2020–2025

B.Sc. / BCA College Project | May 2026

Made by

02

02

OBJECTIVE

Objective of the Project

To predict the probable winner of IPL matches using machine learning

To analyze IPL match data from IPL seasons 2020–2025

To compare different machine learning algorithms (Decision Tree, Random Forest, Logistic Regression)

To understand data preprocessing and prediction using Orange Data Mining Tool

Made by

03

Why This Project

03

Why This Project Was Chosen

IPL is one of the most popular cricket leagues in the world

Match outcomes depend on multiple factors: Teams, Toss, Venue, City

Machine learning can identify patterns from historical IPL data

Helps understand practical applications of Data Mining

Real-world sports analytics use case for academic learning

Made by

04

04

TOOLS

Software & Tools Used

Orange Data Mining Tool

Main tool used for building the ML workflow

Microsoft Excel

Used for editing and cleaning the CSV dataset

Kaggle

Source for downloading IPL historical datasets

Made by

05

05

Datasets

Datasets Explored

matches.csv

Match-level information

Used

deliveries.csv

Ball-by-ball IPL data

Removed (too complex)

IPL.csv

Detailed IPL info, had many unnecessary columns

Removed

Custom Dataset (Final)

Simplified match-level dataset

Final Choice

Made by

06

FINAL DATASET

06

Final Dataset – Columns Used

Dataset Columns

Team1

Team2

Toss_Winner

Toss_Decision

Stadium

City

Season

Match_Winner (Target)

Why This Dataset?

One row = one match (clean structure)

Removed ball-by-ball complexity

Easier for ML models to process

Improved prediction stability

Years covered: IPL 2020 to 2025

Source: Kaggle IPL Datasets & IPL Historical Records

Made by

07

07

CHALLENGES

Problems Faced & Solutions

Problem: Dataset had 65+ complex columns (overs, balls, batter, bowler)

Solution: Reduced to match-level columns only

Problem: Model Cross (X) Error — Target column not selected

Solution: Moved Match_Winner to Target in Select Columns widget

Problem: Wrong Predictions / Overfitting due to post-match columns

Solution: Removed player_of_match and post-match data

Problem: ROC Analysis Error — "Cannot get model output"

Solution: Connected Test & Score → ROC Analysis correctly; used Confusion Matrix instead

Made by

08

Preprocessing

08

Data Preprocessing Steps

Step 1

Load Dataset

Widget: File

Method: Import CSV

Step 2

Select Columns

Features: Team1, Team2, Toss_Winner, Toss_Decision, Stadium, City, Season

Target: Match_Winner

Step 3

Handle Missing Values

Widget: Impute

Method: Average / Most Frequent

Step 4

Encode Text Data

Widget: Continuize

Method: One-Hot Encoding for categorical variables

B.Sc. / BCA College Project | May 2026

Made by

09

ML MODELS

09

Machine Learning Models Used

Decision Tree

• Splits data based on features

• Simple, visual, interpretable

Accuracy: 41.9%

Random Forest

• Ensemble of multiple decision trees

• Reduces overfitting

Accuracy: 47.5%

Top Performer

Logistic Regression

• Best for categorical encoded data

• Strong generalization

Accuracy: 61.4% ← BEST

All 3 models were tested using Test & Score widget in Orange Data Mining Tool

Made by

10

RESULTS

10

Results & Model Comparison

Decision Tree

41.9%

Random Forest

47.5%

Logistic Regression

61.4%

⭐ BEST

Model	Accuracy
Decision Tree	41.9%
Random Forest	47.5%
Logistic Regression	61.4% ⭐

✓

Logistic Regression selected as the final model

Made by

11

11

WORKFLOW

Final Prediction Workflow in Orange

File

Load CSV dataset

Select Columns

Features + Target

Impute

Handle Missing Values

Continuize

One-Hot Encoding

Logistic Regression

Model Training

Test & Score

Evaluate Performance

Predictions

Predict Winner

Confusion Matrix

Visualize Results

Made by

12

12

PREDICTION

How Prediction Was Done

1

Add a new IPL match row in the CSV file

2

Enter pre-match details: Team1, Team2, Toss Winner, Toss Decision, Stadium, City, Season

3

Leave the Match_Winner column empty (this is what we predict)

4

Reload the dataset in Orange Data Mining Tool

5

The Predictions widget predicts the probable match winner

Example Input

                CSK vs MI | Toss: CSK | Decision: Bat | Venue: Wankhede | City: Mumbai | Season: 2024
            

→ Predicted Winner: Mumbai Indians

Made by

13

EVALUATION

13

Model Evaluation & Confusion Matrix

Confusion Matrix

Predicted: Win

Predicted: Loss

Actual: Win

True Positive (TP) ✓

False Negative (FN) ✗

Actual: Loss

False Positive (FP) ✗

True Negative (TN) ✓

Diagonal = Correct Predictions

Evaluation Metrics Used

Accuracy

Overall correct predictions

AUC

Area Under Curve

Precision

Correct positive predictions

Recall

Actual positives caught

F1 Score

Balance of precision & recall

Widget used in Orange

Test & Score

Made by

14

Future Scope

14

Future Improvements

Player Performance Analysis

Include individual player stats in prediction

Team Strength Calculation

Build composite team strength scores

Recent Form Analysis

Weight recent match results more heavily

Head-to-Head Records

Factor in historical matchup data between teams

Player Availability & Injuries

Highest Priority

Real-time squad changes and injury news

These features can significantly improve prediction accuracy beyond 61.4%

Made by

15

15

LEARNINGS

Learning Outcomes

Data Preprocessing

Data Cleaning

Handling Missing Values

Feature Selection

One-Hot Encoding

ML Model Comparison

Prediction with Orange Tool

Model Evaluation Techniques

Sports Analytics Application

This project provided hands-on experience with an end-to-end machine learning pipeline.

Made by

16

16

VIVA Q&A

Possible Viva Questions

Q

Why was Logistic Regression selected?

A: It achieved the highest accuracy (61.4%) among all tested models.

Q

Why was ball-by-ball data removed?

A: It increased dataset complexity and caused overfitting in predictions.

Q

What is One-Hot Encoding?

A: A technique to convert categorical text values (like team names) into numerical form for ML models.

Q

What is a Confusion Matrix?

A: A tool to compare actual vs predicted values and measure model accuracy.

Q

Why is preprocessing important?

A: It improves data quality, handles missing values, and enhances prediction performance.

Made by

17

20

CONCLUSION

Conclusion

IPL winner prediction successfully built using Orange Data Mining Tool

Three ML models compared: Decision Tree, Random Forest, Logistic Regression

Logistic Regression achieved best accuracy of 61.4%

Practical application of data mining in sports analytics demonstrated

Future improvements can further enhance prediction accuracy

Thank You!

Open for Questions & Discussion

IPL Match Winner Prediction | Orange Data Mining Tool | 2026

Made by

DESIGNER-MADE
PRESENTATION,
GENERATED FROM
YOUR PROMPT

Create your own professional slide deck with real images, data charts, and unique design in under a minute.

Generate For Free

IPL Match Winner Prediction using Orange Data Mining Tool

Learn how to predict IPL match winners using machine learning, Orange Data Mining, and Logistic Regression with historical data from seasons 2020-2025.

Data Mining Project

IPL Match

Winner Prediction

Using Machine Learning & Orange Data Mining Tool

IPL Seasons 2020–2025

B.Sc. / BCA College Project | May 2026

02

OBJECTIVE

Objective of the Project

To predict the probable winner of IPL matches using machine learning

To analyze IPL match data from IPL seasons 2020–2025

To compare different machine learning algorithms (Decision Tree, Random Forest, Logistic Regression)

To understand data preprocessing and prediction using Orange Data Mining Tool

Why This Project

Why This Project Was Chosen

IPL is one of the most popular cricket leagues in the world

Match outcomes depend on multiple factors: Teams, Toss, Venue, City

Machine learning can identify patterns from historical IPL data

Helps understand practical applications of Data Mining

Real-world sports analytics use case for academic learning

Software & Tools Used

Orange Data Mining Tool

Main tool used for building the ML workflow

Microsoft Excel

Used for editing and cleaning the CSV dataset

Kaggle

Source for downloading IPL historical datasets

06

FINAL DATASET

Final Dataset – Columns Used

Team1

Team2

Toss_Winner

Toss_Decision

Stadium

City

Season

Match_Winner

One row = one match (clean structure)

Removed ball-by-ball complexity

Easier for ML models to process

Improved prediction stability

Years covered: IPL 2020 to 2025

Source: Kaggle IPL Datasets & IPL Historical Records

07

CHALLENGES

Problems Faced & Solutions

Dataset had 65+ complex columns (overs, balls, batter, bowler)

Reduced to match-level columns only

Model Cross (X) Error — Target column not selected

Moved Match_Winner to Target in Select Columns widget

Wrong Predictions / Overfitting due to post-match columns

Removed player_of_match and post-match data

ROC Analysis Error — "Cannot get model output"

Connected Test & Score → ROC Analysis correctly; used Confusion Matrix instead

Data Preprocessing Steps

B.Sc. / BCA College Project | May 2026

09

ML MODELS

Machine Learning Models Used

Decision Tree

Splits data based on features

Simple, visual, interpretable

Accuracy: 41.9%

Random Forest

Ensemble of multiple decision trees

Reduces overfitting

Accuracy: 47.5%

Logistic Regression

Best for categorical encoded data

Strong generalization

Accuracy: 61.4% ← BEST

All 3 models were tested using Test & Score widget in Orange Data Mining Tool

10

RESULTS

Results & Model Comparison

Decision Tree

41.9%

Random Forest

47.5%

Logistic Regression

61.4%

60%

68%

88%

Logistic Regression selected as the final model

11

WORKFLOW

Final Prediction Workflow in Orange

File

Load CSV dataset

Select Columns

Features + Target

Impute

Handle Missing Values

Continuize

One-Hot Encoding

Logistic Regression

Model Training

Test & Score

Evaluate Performance

Predictions

Predict Winner

Confusion Matrix

Visualize Results

12

PREDICTION

How Prediction Was Done

Add a new IPL match row in the CSV file

Enter pre-match details: Team1, Team2, Toss Winner, Toss Decision, Stadium, City, Season

Leave the Match_Winner column empty

(this is what we predict)

Reload the dataset in Orange Data Mining Tool

The Predictions widget predicts the probable match winner

CSK vs MI | Toss: CSK | Decision: Bat | Venue: Wankhede | City: Mumbai | Season: 2024

Predicted Winner: Mumbai Indians

EVALUATION

13

Model Evaluation & Confusion Matrix

Confusion Matrix

Predicted: Win

Predicted: Loss

Actual: Win

True Positive (TP) ✓

False Negative (FN) ✗

Actual: Loss

False Positive (FP) ✗

True Negative (TN) ✓

Diagonal = Correct Predictions

Evaluation Metrics Used

Accuracy

Overall correct predictions

AUC

Area Under Curve

Precision

Correct positive predictions

Recall

Actual positives caught

F1 Score

Balance of precision & recall

Widget used in Orange

Test & Score

14

Future Scope

Future Improvements

Player Performance Analysis

Include individual player stats in prediction

Team Strength Calculation

Build composite team strength scores

Recent Form Analysis

Weight recent match results more heavily

Head-to-Head Records

Factor in historical matchup data between teams

Player Availability & Injuries

Real-time squad changes and injury news

These features can significantly improve prediction accuracy beyond 61.4%

15

LEARNINGS

Learning Outcomes

Data Preprocessing

Data Cleaning

Handling Missing Values

Feature Selection

One-Hot Encoding

ML Model Comparison

Prediction with Orange Tool

Model Evaluation Techniques

Sports Analytics Application

This project provided hands-on experience with an end-to-end machine learning pipeline.

16

VIVA Q&A

Possible Viva Questions

Why was Logistic Regression selected?

It achieved the highest accuracy (61.4%) among all tested models.

Why was ball-by-ball data removed?

It increased dataset complexity and caused overfitting in predictions.

What is One-Hot Encoding?

A technique to convert categorical text values (like team names) into numerical form for ML models.

What is a Confusion Matrix?

A tool to compare actual vs predicted values and measure model accuracy.

Why is preprocessing important?

It improves data quality, handles missing values, and enhances prediction performance.

CONCLUSION

Conclusion

IPL winner prediction successfully built using Orange Data Mining Tool

Three ML models compared: Decision Tree, Random Forest, Logistic Regression

Logistic Regression achieved best accuracy of 61.4%

Practical application of data mining in sports analytics demonstrated

Future improvements can further enhance prediction accuracy

Thank You!

Open for Questions & Discussion

IPL Match Winner Prediction | Orange Data Mining Tool | 2026

ipl-prediction
machine-learning
data-mining
orange-tool
sports-analytics
python-project
logistic-regression
college-project