Airbnb Business Analytics: NYC Insights with Orange Data Mining
Explore NYC Airbnb listing insights using clustering, regression, and data mining. Learn key pricing factors and business strategies for hosts and investors.
Academic Project
Business Analytics Using Orange Data Mining Tools on the Airbnb Dataset
Global Short-Term Accommodation Marketplace for Tourist Services
Business Analytics | Orange Data Mining | Airbnb Dataset
Introduction
The purpose of this project is to apply business analytics methods using a real life data set and to create business intelligence by using Orange Data Mining tool.
To carry out the analysis I used the New York City Airbnb Open Data, this data includes all the listings available on the platform along with prices, locations, types of accommodation and how engaged the listings are.
In order to maximize returns on investment of Airbnb listings in Boston, it is valuable to perform analysis on existing listings. To carry out such an analysis, we employed clustering, regression, and association rule mining techniques to gain better insight into the large number of listings.
02
Dataset Understanding
This dataset contains millions of listings on Airbnb across the entire city of New York. It provides information about pricing, availability, and buyer behavior.
Millions of Listings
New York City
Pricing, Availability & Behavior
Business Analytics Capstone
03
Key Variables
Price
Cost per night
Neighbourhood Group
Location category
Room Type
Entire home/Apartment, Private room, Shared room
Ava (Cap)
Abbreviation for Availability and Capacity. Cap has become the standard term for these two parameters
Number of Reviews
Customer engagement
Reviews per Month
Listing popularity
Minimum Nights
Booking requirement
04
Business Relevance
This data is useful for anyone wishing to see the demand and price levels for different areas to help Airbnb hosts, property investors and any hospitality business set their prices appropriately.
Airbnb Hosts
Property Investors
Hospitality Businesses
Objectives
Analyze Pricing Patterns
Explore price distribution across neighborhoods and property types
Identify Key Influencing Factors
Determine which variables most affect listing prices
Segment Listings into Meaningful Groups
Apply clustering to discover natural groupings in the data
Generate Actionable Business Insights
Provide recommendations for hosts, investors, and hospitality businesses
Data Cleaning & Preprocessing
Data was preprocessed using Orange Data Mining suite.
Removed unnecessary variables (ID, listing name, host name) and replaced missing values in reviews_per_month with 0.
Removed all duplicate records and extreme values greater than 1000. Room type and neighbourhood group fields formatted correctly.
Numerical information normalised. A new feature created categorising listings into low, medium, or high price.
Data Analysis & Descriptive Statistics
Listings in Manhattan are significantly more expensive than those in other neighborhoods.
The majority of listings are for private rooms in order to afford the cost of housing.
There is room for affordable options and luxury pricing at the same time. Very little correlation between number of reviews and price.
Results show a connection between location/property type and both listing value and market image.
Clustering (K-Means)
We applied K-Means clustering to segment the listings by price, availability and number of reviews.
Cluster A — Premium
High prices and low availability for booking. Exclusive, high-demand listings.
Cluster B — Moderate
Moderate prices and moderate availability. Second largest segment.
Cluster C — Budget
Low price and high availability. Highly competitive segment.
Regression Analysis
A regression analysis was performed to examine the variables that affect the price.
The best predictor of a listing's price is whether it is located in Manhattan and whether it is an entire home — far exceeding the impact of any other variable.
Association Rule Mining
Association rule mining was applied to discover co-occurrences of attributes in the dataset.
Entire Home → Higher Price
Entire home listings consistently commanded higher prices across all neighborhood groups.
Higher Availability → Lower Price
Listings with higher availability tended to command lower prices, indicating lower demand.
We will investigate these factors further to gain more insight into customer preferences and supply and demand.
Data Visualization
Bars indicate average price per neighborhood, with Manhattan the priciest.
Distribution of listings categorized as private, shared, or other.
Very weak correlation between price and reviews with many outliers.
Highlight variation in price within many of the neighborhoods.
Business Recommendations
After reviewing the analysis, several improvements can be implemented.
Incorporate dynamic pricing based on demand, location and property type.
List the whole house as one booking and place it in areas with higher demand.
Optimize the listing for better descriptions, images, and customer service to gain more visibility.
Offer a lower nightly rate for longer stays and price down during off-peak periods to encourage more bookings.
Limitations
The calculations are based on figures for one city only.
No allowance has been made for factors such as time of year, market conditions and regulations.
Assumptions
Customer reviews are a good indicator for engagement.
Pricing trends are assumed to be constant.
Conclusion
This project applied business analytics methods using Orange Data Mining to extract insights from real-world data.
The analysis shows that pricing is mainly affected by location and property type.
Business analytics helps in making better decisions.
- airbnb-analytics
- data-mining
- orange-software
- business-intelligence
- market-analysis
- nyc-data
- k-means-clustering
- predictive-modeling