A GIS based machine learning algorithm to predict price of Airbnb’s hotel rooms in Newyork city using open street map, Geopandas packages and XGBoost

Himanshu Bhardwaj
3 min readJul 19, 2020

Location has a very important role in hotel business. Along side the room type, reputation, and availability of the room, the location of the hotel is a very determining factor of price of the hotel room. One of the problem while making a price prediction model based on location of the hotel is that we exactly don’t know what kind of relationship exist between the latitude, longitude of the hotel and its price, but one thing is certain, the relationship is not linear, hence a linear regression model cannot be built between the location of the hotel and price. On further investigation, it is found that the hotel price varies with its proximity to the tourist attraction of the area. More closer the hotel to a tourist attraction, higher its room price would be. Therefore, in this article, a price prediction model based on the location of the hotels is built using open street map (OSM), Geopandas packages and XGboost.

A map showing price variation of Airbnb hotel rooms is shown in Fig-1. It is quite clear from the map that room prices are higher in Manhattan than Queens and Brooklyn area of the Newyork city.

Figure-1: Price variation of the Airbnb hotel rooms in the Newyork city, USA

The datasets used for making the price prediction model contains the latitude and longitude of the hotel, room availability, reviews, and neighborhood of the hotel. The information of the datasets is shown below:

Step-1: Determination of tourist attraction in the area

First of all, the location of tourist attraction of the Newyork city from open street map (OSM) is extracted. Locations of the tourist attraction of the Manhattan, USA is marked in the following map.

Figure 2: map showing the tourist attraction of Manhattan Island, Newyork city

Step 2: Determining the distance between hotels and the nearest tourist attraction

Distance between the hotel and the nearest tourist attraction is determined. Here the distance is not the direct distance between the point but distance along the road connecting the two points. To determine the distance, network of roads containing nodes and edges are to be determined. Road network of some portion of Manhattan is shown in the following figure.

Figure 3: Network of highways containing nodes and edges of Manhattan, New York city

After extracting the highway network of the Newyork city, it is possible to determine the shortest distance between two points as shown below in the following figure.

Figure 4: Shortest distance between two points with graph of highways using osmnx package

Step 3: Building a regression model using XGboost package

A jupyter notebook to build an XGboost regression model is given below. The predictors I have used to predict price are distance of the hotel from the nearest tourist attraction, room type, number of nights of booking, number of reviews, host listing counts, and room availability.

As shown in the following figure, the distance of the hotel from the nearest tourist attraction, named as ‘dist’, is the most important feature to predict the price of the hotel room.

Feature 5: importance table of parameters used for prediction

Conclusion

Entering blindly the latitude and longitude of the hotel location in the linear regression model will not be useful, instead using the distance of the hotel from the tourist attraction makes sense and also corroborated by the above result.

--

--