The intercept our model the base price the sale price
From Table 1, it can be seen that the average house sale price in thousands of dollars is $804.88. The median house sale price is $798.95 thousand dollar. The spread of the sale prices can be assessed from the standard deviation which is $137.13 thousand dollar. Therefore, the spread of the data is high. The skewness of the data is very low which is 0.09. Therefore, it can be commented that the data is symmetrical and follows normal distribution. It can also be seen from the difference between mean and media, which are nearly the same considering the size of data. Therefore, the spread of the data is high whereas skewness of the house prices is nearly symmetrical.
Figure 1 Scatter plot between price and size
As it can be seen from the figure that there is a positive relationship between house sale prices and size of the house. It can be interpreted that as the size of house in square meters increases the sale price of house also increases.
Q.4 The linear regression of below stated model is performed using excel,
pricei= β0+ β1size i + β2age i+ β3proximityi + ui
SUMMARY OUTPUT | ||||
---|---|---|---|---|
Regression Statistics | ||||
Multiple R | 0.93093204 | |||
R Square | 0.866634463 | |||
Adjusted R Square | 0.86623276 | |||
Standard Error | 50.15287793 | |||
Observations | 1000 | |||
ANOVA | ||||
df | SS | MS | F | |
Regression | 3 | 16279587.45 | 5426529 | 2157.399 |
Residual | 996 | 2505249.92 | 2515.311 | |
Total | 999 | 18784837.37 | ||
Coefficients | Standard Error | t Stat | P-value | |
Intercept | 25.59282925 | 14.00464211 | 1.827453 | 0.067931 |
size | 3.890134007 | 0.078081854 | 49.82123 | 1E-272 |
age | -0.601560515 | 0.168471364 | -3.5707 | 0.000373 |
proximity | 195.8438204 | 3.176667127 | 61.65072 | 0 |
$${\widehat{price}}_{i} = \ 25.5983 + \ 3.8901{(size\ }_{i}) - 0.6016({age\ }_{i}) + \ 195.8438{(proximity}_{i})$$
The adjusted R2 of the model is 0.866, which is a measure of goodness of fit. The intercept of our model is the base price of the sale price. It is the average price level which is not dependent on any of the explanatory variables. The estimates of slope parameters of size, age and proximity are 3.8901, -0.6016 and 195.8438 respectively. The estimate of slope coefficient of size can be interpreted as the sale price is increased by $3.89 thousand if size increases by 1 square meters, keeping other variables constant. Similarly, if the age of the house is increased by 1 year the sale prices are decreased by $-0.6015 thousand while keeping the effect of size and proximity constant. If the house is near major business district then the proximity variable will take the value of one, therefore, the prices of house which are near business districts are on 195.8438 thousand more as compared to the houses which are not near the business district keeping other factors constant. The results are expected as, the houses with more size and which are near business districts will costs more. The age of house is also very important factor, the more old the age of the house, the prices of house will decrease. It is also important to note that at 5% only size, age and proximity estimates are significant.
SUMMARY OUTPUT | |||||
---|---|---|---|---|---|
Regression Statistics | |||||
Multiple R | 0.928131596 | ||||
R Square | 0.86142826 | ||||
Adjusted R Square | 0.860731219 | ||||
Standard Error | 0.064934719 | ||||
Observations | 1000 | ||||
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 5 | 26.05462137 | 5.210924 | 1235.836 | 0 |
Residual | 994 | 4.191218689 | 0.004217 | ||
Total | 999 | 30.24584006 | |||
Coefficients | Standard Error | t Stat | P-value | ||
Intercept | 2.175009803 | 0.090650516 | 23.99335 | 1E-100 | |
log(size) | 0.847233288 | 0.017580409 | 48.1919 | 2.5E-262 | |
age | -0.000873122 | 0.000218377 | -3.99823 | 6.85E-05 | |
proximity | 0.247529504 | 0.004113783 | 60.17077 | 0 | |
pool | 0.018570706 | 0.00510377 | 3.638625 | 0.000288 | |
fireplace | 0.007021951 | 0.004136736 | 1.697462 | 0.089922 |
Q.8 Null Hypothesis: all the variables in the model are not significant.
Alternate hypothesis: at least one coefficient is different from zero.