17.4 Hurdle and Zero-Inflation Models
Count data often includes many observations at 0 which can lead to problems using a Poisson or a Negative-Binomial Regression Model. The application of both models is first illustrated with the NMES1988
data from the package AER and then with the BLM protest data.
The data NMES1988
contains 4406 observations of people on Medicare who are 66 years or older. The outcome of interest is the number of doctor \(visits\) as a function of \(hospital\) (number of hospital visits), \(health\) (self-indicated health status), \(chronic\) (number of chronic conditions), \(gender\), \(school\), and \(insurance\).
data("NMES1988",package="AER")
eq = visits~hospital+health+chronic+gender+school+insurance
bhat_pois = glm(eq,data=NMES1988,family=poisson)
summary(bhat_pois)
##
## Call:
## glm(formula = eq, family = poisson, data = NMES1988)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.028874 0.023785 43.258 <2e-16 ***
## hospital 0.164797 0.005997 27.478 <2e-16 ***
## healthpoor 0.248307 0.017845 13.915 <2e-16 ***
## healthexcellent -0.361993 0.030304 -11.945 <2e-16 ***
## chronic 0.146639 0.004580 32.020 <2e-16 ***
## gendermale -0.112320 0.012945 -8.677 <2e-16 ***
## school 0.026143 0.001843 14.182 <2e-16 ***
## insuranceyes 0.201687 0.016860 11.963 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 26943 on 4405 degrees of freedom
## Residual deviance: 23168 on 4398 degrees of freedom
## AIC: 35959
##
## Number of Fisher Scoring iterations: 5
##
## Call:
## glm(formula = eq, data = NMES1988)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.63203 0.33480 4.875 1.13e-06 ***
## hospital 1.61976 0.13264 12.211 < 2e-16 ***
## healthpoor 1.84532 0.31234 5.908 3.72e-09 ***
## healthexcellent -1.33140 0.36257 -3.672 0.000243 ***
## chronic 0.94440 0.07693 12.276 < 2e-16 ***
## gendermale -0.63185 0.19454 -3.248 0.001171 **
## school 0.14345 0.02726 5.262 1.49e-07 ***
## insuranceyes 1.10397 0.24362 4.532 6.01e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 40.02228)
##
## Null deviance: 201252 on 4405 degrees of freedom
## Residual deviance: 176018 on 4398 degrees of freedom
## AIC: 28769
##
## Number of Fisher Scoring iterations: 2
##
## Call:
## hurdle(formula = eq, data = NMES1988, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -1.1718 -0.7080 -0.2737 0.3196 18.0092
##
## Count model coefficients (truncated negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.197699 0.058973 20.309 < 2e-16 ***
## hospital 0.211898 0.021396 9.904 < 2e-16 ***
## healthpoor 0.315958 0.048056 6.575 4.87e-11 ***
## healthexcellent -0.331861 0.066093 -5.021 5.14e-07 ***
## chronic 0.126421 0.012452 10.152 < 2e-16 ***
## gendermale -0.068317 0.032416 -2.108 0.0351 *
## school 0.020693 0.004535 4.563 5.04e-06 ***
## insuranceyes 0.100172 0.042619 2.350 0.0188 *
## Log(theta) 0.333255 0.042754 7.795 6.46e-15 ***
## Zero hurdle model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.043147 0.139852 0.309 0.757688
## hospital 0.312449 0.091437 3.417 0.000633 ***
## healthpoor -0.008716 0.161024 -0.054 0.956833
## healthexcellent -0.289570 0.142682 -2.029 0.042409 *
## chronic 0.535213 0.045378 11.794 < 2e-16 ***
## gendermale -0.415658 0.087608 -4.745 2.09e-06 ***
## school 0.058541 0.011989 4.883 1.05e-06 ***
## insuranceyes 0.747120 0.100880 7.406 1.30e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta: count = 1.3955
## Number of iterations in BFGS optimization: 16
## Log-likelihood: -1.209e+04 on 17 Df