Mobile Phones Company (in Python)

There is a database on the kaggle.com website from which I will be preparing a model to predict the price range of mobile phones for a company governed by Bob.

https://www.kaggle.com/iabhishekofficial/mobile-price-classification

The goal is to help Bob determine what price ranges are profitable and efficient in the mobile market business. I will use a Linear Regression model for this task.

More precise results:

https://github.com/PawelTokarski95/Mobile-Phones-Company-in-Python-

At the beginning, after loading the data, I had to perform some operations on the data. I noticed that some observations are missing in the data.

MPhones.info()
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 battery_power 2000 non-null int64
1 blue 2000 non-null int64
2 clock_speed 2000 non-null float64
3 dual_sim 2000 non-null int64
4 fc 2000 non-null int64
5 four_g 2000 non-null int64
6 int_memory 2000 non-null int64
7 m_dep 1324 non-null float64
8 mobile_wt 2000 non-null int64
9 n_cores 2000 non-null int64
10 pc 2000 non-null int64
11 px_height 1368 non-null float64
12 px_width 1326 non-null float64
13 ram 1821 non-null float64
14 sc_h 1781 non-null float64
15 sc_w 2000 non-null int64
16 talk_time 2000 non-null int64
17 three_g 1671 non-null float64
18 touch_screen 2000 non-null int64
19 wifi 2000 non-null int64
20 price_range 2000 non-null int64
dtypes: float64(7), int64(14)

Therefore, I filled the data observations with '0' instead of NA

RangeIndex: 2000 entries, 0 to 1999
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 battery_power 2000 non-null int32
1 blue 2000 non-null int32
2 clock_speed 2000 non-null int32
3 dual_sim 2000 non-null int32
4 fc 2000 non-null int32
5 four_g 2000 non-null int32
6 int_memory 2000 non-null int32
7 m_dep 2000 non-null int32
8 mobile_wt 2000 non-null int32
9 n_cores 2000 non-null int32
10 pc 2000 non-null int32
11 px_height 2000 non-null int32
12 px_width 2000 non-null int32
13 ram 2000 non-null int32
14 sc_h 2000 non-null int32
15 sc_w 2000 non-null int32
16 talk_time 2000 non-null int32
17 three_g 2000 non-null int32
18 touch_screen 2000 non-null int32
19 wifi 2000 non-null int32
20 price_range 2000 non-null int32
dtypes: int32(21)

I was able to fix the missing data using a tool known as KNNImputer.

After that, I plotted the correlation matrix to see the degree to which the variables correlated with the target feature. If the correlation is low, then the predictive data will be elastic and low on bias. Therefore, I used it.

sns.heatmap(MPhones.corr())
plt.show()

After selecting the variables (features), I applied linear regression. The result was a 63% R2 score. After using polynomial regression, this score improved to 70%, which is definitely better.

SIMPLE LINEAR REGRESSION VS POLYNOMIAL REGRESSION