15.1 Binary Choice Estimation in R

There are (at least) two possibilities to obtain the coefficient estimates in R. The first is using the built in R command glm():

bhat_glm_logit = glm(buying~income,family=binomial(link="logit"),data=organic)
summary(bhat_glm_logit)
## 
## Call:
## glm(formula = buying ~ income, family = binomial(link = "logit"), 
##     data = organic)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.87557    1.13842  -5.161 2.45e-07 ***
## income       0.11709    0.02247   5.211 1.87e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 138.469  on 99  degrees of freedom
## Residual deviance:  70.931  on 98  degrees of freedom
## AIC: 74.931
## 
## Number of Fisher Scoring iterations: 6

Note that interpretation of the coefficients is slightly different from the regular linear model. The sign of the coefficient estimate for income is interpreted as the direction in which the probability changes. In this case, the coefficient is positive and thus, an increase (decrease) in income leads to an increase (decrease) in the probability of purchasing organic food. In addition, the coefficients are statistically significant. As aforementioned, the coefficients do not indicate the marginal effects though. To calculate the marginal effects, a slightly different approach is necessary. Let us first look at the second approach of obtaining the coefficient estimates and the R package mfx is required to do so.

bhat = logitmfx(buying~income,data=organic)
summary(bhat$fit)
## 
## Call:
## glm(formula = formula, family = binomial(link = "logit"), data = data, 
##     start = start, control = control, x = T)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.87557    1.13842  -5.161 2.45e-07 ***
## income       0.11709    0.02247   5.211 1.87e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 138.469  on 99  degrees of freedom
## Residual deviance:  70.931  on 98  degrees of freedom
## AIC: 74.931
## 
## Number of Fisher Scoring iterations: 6

The results are identical as before using the but the command allows for the calculation of the marginal effects as well. This is done with the command the bhat\$mfxest.

bhat$mfxest
##             dF/dx   Std. Err.        z        P>|z|
## income 0.02919553 0.005634262 5.181785 2.197728e-07

It is important to note that the marginal effects are taken at the mean of the independent variables. To calculate the marginal effects at specific points, the command margins() must be used. Before, we used the command glm() to calculate the logit coefficients. The reason for using the glm() is that it allows us to calculate the predicted probabilities. Consider the example to purchase organic food and assume that there are three new respondents with income levels $25,000, $50,000, and $75,000. To predict the probability of those individuals purchasing organic food, the following functions can be used:

datablock = data.frame(income=c(25,50,75))
predict(bhat_glm_logit,newdata=datablock,type="response")
##         1         2         3 
## 0.0498116 0.4946870 0.9481377