17.4 Hurdle and Zero-Inflation Models

Count data often includes many observations at 0 which can lead to problems using a Poisson or a Negative-Binomial Regression Model. The application of both models is first illustrated with the NMES1988 data from the package AER and then with the BLM protest data.

The data NMES1988 contains 4406 observations of people on Medicare who are 66 years or older. The outcome of interest is the number of doctor \(visits\) as a function of \(hospital\) (number of hospital visits), \(health\) (self-indicated health status), \(chronic\) (number of chronic conditions), \(gender\), \(school\), and \(insurance\).

data("NMES1988",package="AER")
eq = visits~hospital+health+chronic+gender+school+insurance
bhat_pois = glm(eq,data=NMES1988,family=poisson)
summary(bhat_pois)
## 
## Call:
## glm(formula = eq, family = poisson, data = NMES1988)
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      1.028874   0.023785  43.258   <2e-16 ***
## hospital         0.164797   0.005997  27.478   <2e-16 ***
## healthpoor       0.248307   0.017845  13.915   <2e-16 ***
## healthexcellent -0.361993   0.030304 -11.945   <2e-16 ***
## chronic          0.146639   0.004580  32.020   <2e-16 ***
## gendermale      -0.112320   0.012945  -8.677   <2e-16 ***
## school           0.026143   0.001843  14.182   <2e-16 ***
## insuranceyes     0.201687   0.016860  11.963   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 26943  on 4405  degrees of freedom
## Residual deviance: 23168  on 4398  degrees of freedom
## AIC: 35959
## 
## Number of Fisher Scoring iterations: 5
bhat_nb = glm(eq,data=NMES1988)
summary(bhat_nb)
## 
## Call:
## glm(formula = eq, data = NMES1988)
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1.63203    0.33480   4.875 1.13e-06 ***
## hospital         1.61976    0.13264  12.211  < 2e-16 ***
## healthpoor       1.84532    0.31234   5.908 3.72e-09 ***
## healthexcellent -1.33140    0.36257  -3.672 0.000243 ***
## chronic          0.94440    0.07693  12.276  < 2e-16 ***
## gendermale      -0.63185    0.19454  -3.248 0.001171 ** 
## school           0.14345    0.02726   5.262 1.49e-07 ***
## insuranceyes     1.10397    0.24362   4.532 6.01e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 40.02228)
## 
##     Null deviance: 201252  on 4405  degrees of freedom
## Residual deviance: 176018  on 4398  degrees of freedom
## AIC: 28769
## 
## Number of Fisher Scoring iterations: 2
bhat_hurdle = hurdle(eq,data=NMES1988,dist="negbin")
summary(bhat_hurdle)
## 
## Call:
## hurdle(formula = eq, data = NMES1988, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1718 -0.7080 -0.2737  0.3196 18.0092 
## 
## Count model coefficients (truncated negbin with log link):
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      1.197699   0.058973  20.309  < 2e-16 ***
## hospital         0.211898   0.021396   9.904  < 2e-16 ***
## healthpoor       0.315958   0.048056   6.575 4.87e-11 ***
## healthexcellent -0.331861   0.066093  -5.021 5.14e-07 ***
## chronic          0.126421   0.012452  10.152  < 2e-16 ***
## gendermale      -0.068317   0.032416  -2.108   0.0351 *  
## school           0.020693   0.004535   4.563 5.04e-06 ***
## insuranceyes     0.100172   0.042619   2.350   0.0188 *  
## Log(theta)       0.333255   0.042754   7.795 6.46e-15 ***
## Zero hurdle model coefficients (binomial with logit link):
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      0.043147   0.139852   0.309 0.757688    
## hospital         0.312449   0.091437   3.417 0.000633 ***
## healthpoor      -0.008716   0.161024  -0.054 0.956833    
## healthexcellent -0.289570   0.142682  -2.029 0.042409 *  
## chronic          0.535213   0.045378  11.794  < 2e-16 ***
## gendermale      -0.415658   0.087608  -4.745 2.09e-06 ***
## school           0.058541   0.011989   4.883 1.05e-06 ***
## insuranceyes     0.747120   0.100880   7.406 1.30e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta: count = 1.3955
## Number of iterations in BFGS optimization: 16 
## Log-likelihood: -1.209e+04 on 17 Df