Udacity Data Science Project: Understand the Seattle Airbnb Price
Airbnb is a global vacation rental online marketplace which offers arrangement for lodging, primarily homestays and tourism experiences since 2008. Many of us would like to use Airbnb when we travel around the world, since Airbnb is easy to order, usually less expensive than traditional hotels, and provide opportunities to connect travelers with good local hosts. I had a great experience with an Airbnb host for 5 weeks at Seattle when I joined a data science program at Seattle last year.
From the customer perspective, it is important to better understand the Airbnb price. Do not just look at the price on the Airbnb website, data scientists can help you to find the best rates! As the first project of the Udacity Data Science Nanodegree, I used the Seattle Airbnb data to explore the Airbnb price in Seattle, and addressed three business questions:
● Does Seattle Airbnb price shows seasonal or daily variation?
● What are the most important ingredients/features for the Airbnb price?
● How to understand the most important features for the price?
About the Dataset
I used the Seatle Airbnb opensource data from kaggle, which includes the information of Airbnb listing with 3818 data points from a listings table, and about 1.4 million calendar activities associated with the listings from a calendar table.
Seasonality
The calendar table gives the activities of 3723 listing ids with the price and availability for individual days, where 99% of the activities are from the year 2016. First of all I looked into the distribution of the averaged price for each listing id. The below figure shows that most of the price falls into ~ $100 — $150, with a median of $108 and mean of $136.
I analyzed the Airbnb price averaged by month to see if there is any seasonality behavior of the price. The below figure shows that the price reaches peak in summer (June, July and August) and bottom in January and February. This seasonal variation is consistent with the traveling season.
I also looked into the price as a function of days of the week. The price shows peaks on Fridays and Saturdays, surprisingly not on Sundays. The price in weekdays except Friday is similar.
Important Features for the Price
Next, I explored the listings table, which includes full descriptions and average review scores. This table has 3818 data points and 91 features along with the price information. To better understand the correlation between the features and the price, it is important to reduce the dimensions of the features, and to understand which features are most important for the Airbnb price. Therefore, I built a price prediction model based on the features from the listings table and selected the top ranked features.
Some details about the feature engineering: I first checked the missing values in each feature. If a feature has 80% or more nulls I dropped it. In order to simplify the price prediction model, I dropped categorical features which have more than 1500 values or only has one value. I filled the missing values of the integer and float features with the mean of the related features. I also removed the collinear variables in order to improve the performance of the regression models, and re-scaled all feature values from 0 to 1.
After feature engineering, the number of features in the listings table is reduced from 91 to 50. To build a price prediction model, I split 80% of the data for model training, and 20% for testing. LightGBM and Random Forest regressors were used to train the data and build the model. R squared (R2), mean squared error (MSE) and median absolute error (MAE) were used to evaluate the model accuracy. The lightGBM model shows a better R2 score but the Random Forest regression model gives better MSE and MAE scores.
Both the LightGBM and Random Forest model provide a ranking list of the feature importance. I averaged the two lists to give the rankings of averaged feature importance. The top 20 important features were visualized as follows, with x-axis being the normalized importance weight:
Weekly Price and bedrooms are among the top important features, followed by reviews per month, bathrooms, cleaning fee, room types and neighborhood group cleansed. If one wants to find a Seattle Airbnb and gets the best rates under the budget, I suggested to look into these ingredients. Now let us deep dive into these features and dig out more insights.
Feature Analysis
On the top of the important features we can see weekly price and number of bedrooms. I looked into the relation between weekly price and the price, which shows a strong correlation with each other. The weekly price is a very good indicator to suggest the price. For example, if you have a budget about $150, you should look into the weekly price less than $1000.
Next I looked into the correlation between number of bedrooms and the price. It is no a surprise that more bedrooms lead to more expensive cost. For example, if you need 3 bedrooms, the expected price will be ~ $ 200–300. For someone with a budget of $100, a single bedroom is the best choice.
The correlation of some other top features for the price are given as follows. We can see that review per month shows anti-correlation with the price, but the relation is not very linearly. The cleaning fee and number of bathrooms are well linear-correlated with the price. There are three room types, one can see that entire home/apts are much more expensive than shared rooms. The shared rooms are really cheap!
The last feature I recommend to look into is the neighborhood information. The following figure gives the averaged price based on neighborhood. We can see that Magnolia shows the highest average price, followed by Queen Anne and Downtown area. Northgate and Delridge have the lowest Airbnb rates.
Similar methods can be used to analyze other features, but the above ones are the most important features from the price prediction. I recommend these features are enough for you to estimate the price and find the best rates :)
Conclusion
In this blog I explored the Seattle Airbnb price from the kaggle dataset to help customers to better understand the root causes of the price and get better rates. I found that the Airbnb price shows a seasonal variation. I built a price predictor based on features from the listings table. Among many features I explored the most important features for the price prediction, and study the correlation between these features and the price, such as weekly price, number of bedrooms, bathrooms, reviews per month, cleaning fee, room types and neighborhood. These features can be considered as the most important indictors for evaluating the Airbnb price. I hope this blog will be helpful for your next experience with Airbnb.
The code can be found in my github.
Find me on LinkedIn.