Real Estate Segmentation: part 1

The purpose of this project was to create a segmenation process of housing and prediction of the next segmentations. I divided this project into two parts on my website, but the whole code is stored on my github.

https://github.com/PawelTokarski95/Real-Estate-Segmentation

Basically, my dataset consists of a set of features for which clusters are to be assigned. The clusters determine the classification into real estate categories along with these features, making this process potentially significant for both analytical teams and developers themselves. This allows for easier segmentation of offers, such as pricing, and tailoring them to specific target groups.

Below is the graph of a correlation between the features. It is visible that correlation is mostly moderate or low. It's important because KMeans model is poor in terms of dealing with correlated data.

I scaled the data with StandardScaler function, which allowed me to feed it into a KMeans model. This model is used for segmentation and creates clusters based on clearly separated groups of observations in the graph. Below, I present the elbow method plot along with the results of the Davies-Bouldin index and the Silhouette score. I aimed to balance these metrics and, as a result, chose 17 clusters, which provided the optimal value for both plots.

In the second part, I will present the results of further analyses. I will compare the classification performance of three models and select the best one among them.