12.1 Introduction

Recall the bivariate regression model with one independent and one dependent variable: \[y=\beta_0+\beta_1 \cdot x_1+\epsilon\] The multivariate linear regression model includes more than one independent variable and is simply an extension of the bivariate regression model: \[y=\beta_0+\beta_1 \cdot x_1+\beta_2 \cdot x_2 + \dots + \beta_k \cdot x_k + \epsilon\] Whether we consider the bivariate or multivariate model, the objective is always to minimize the sum of squared errors which has led to the name ordinary least square (OLS) model. The equation of a line can be determined using slope (\(\beta_0\)) and the intercept (\(\beta1\)), i.e.: \[E(y|x_1)=\beta_0+\beta_1 \cdot x_1\] The case of a regression model with two independent variable can still be represented in a 3-dimensional graph as depicted below

The purpose of the multivariate regression model is to measure the effect of independent variables on the dependent variable. It is crucial to control for everything else that could influence the dependent variable. For example, measuring the weekly grocery bill as a function of years of education might give you a statistically significant effect for education but if income is included, the effect for education might (most likely) disappear.

The first example involves estimating home values based on square footage and number of garage spots of a house in the 46268 ZIP code in Indianapolis. The data is contained in indyhomes.

indyhomes46268 = subset(indyhomes,zip==46268)
bhat = lm(price~sqft+garage,data=indyhomes46268)
## Call:
## lm(formula = price ~ sqft + garage, data = indyhomes46268)
## Residuals:
##    Min     1Q Median     3Q    Max 
## -58780  -7817   1582   7886  51803 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 81733.141  15896.004   5.142 5.20e-06 ***
## sqft           40.897      4.383   9.331 2.85e-12 ***
## garage      16580.964   7136.866   2.323   0.0245 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 20710 on 47 degrees of freedom
## Multiple R-squared:  0.675,  Adjusted R-squared:  0.6611 
## F-statistic:  48.8 on 2 and 47 DF,  p-value: 3.388e-12

Depending on the nature of the variables, it might be necessary to scale your variables for ease of interpretation. This might be necessary if coefficients are very large or very small. A rescaling, e.g., dividing income by 1000, does affect the coefficients and the standard errors but has no effect on the t-statistics.