17.6 Exercises

  1. Aptitude Tobit Model (***): Consider the censored data set in aptitude in which the aptitude score is limited at 800. In a first step, estimate a regular OLS model with \(apt\) as the dependent variable and \(read\), \(math\), and \(factor(program)\) as the independent variables. In a second step, estimate a model that takes the censored nature of the data into account. Is there a significant difference in estimates?

  2. Chicago Grocery Stores (***): Subdivision of Chicago are called Chicago Community Areas (CCA). The data in chicagogrocery includes data about the number of grocery stores (\(stores\)) in each CCA as well as demographic information. Estimate a Poisson and Negative Binomial Regression Model with \(stores\) as the dependent variable and the following independent variables: \(income\), \(pop\), unemployment rate (\(unemployed/laborforce\)) and percentage of blacks (\(black/pop\)). What do you conclude? Are the results what you would expect?

  3. Extramarital Affairs (***): Consider the data set in fair. The independent variables, which we are going to use are \(male\), \(yearsmarried\) (number of years married), \(children\), \(religious\) (religiousness on a scale of 1-5 with 1 being basically an atheist), and \(marriagehappiness\) (self-rating of marriage with 1=very unhappy to 5=very happy). You are going to execute five models: (1) regular OLS, (2) Probit, (3) Poisson, (4) Negative Binomial, and (5) Hurdle Model. For the Probit model, you are running a model with a binary variable of either 0 (no affair) or 1 (at least one affair). Compare the models in terms of statistical significance. What changes from one model to the next? What model is the most appropriate and why?

  4. Biochemistry Articles (***): Publish or perish summarizes life in academia. The dependent variable of interest is \(articles\) and the independent variables are \(female\), \(married\), \(kidsbelow6\), and \(mentorarticles\) (number of articles by Ph.D. mentor). Estimate a quasi-poisson and a hurdle model. According to the model, what matters in terms of graduate student productivity? Why do those findings matter?

  5. BLM II (***): In a previous exercise, a regular OLS regression model was used to explain the positive number of protests. In this exercise, a zero-inflated and a hurdle model are estimated. The dependent variable for this exercise is protest frequency (\(totprotests\)) and the independent variables are city population (\(pop\)), population density (\(popdensity\)), percent Black (\(percentblack\)), black poverty rate (\(blackpovertyrate\)), percent of population with at least a bachelor (\(percentbachelor\)), college enrollment (\(collegeenrollpc\)), share of democrats (\(demshare\)), and Black police-caused deaths per 10,000 people (\(deathsblackpc\)). Interpret the output of the two models.

  6. Lung Cancer (**): Consider the data in lung. Plot the survival curve differentiating by sex. Estimate a survival model with \(age\) and \(female\) as the independent variables.

  7. Henning (***): Plot the survival curve differentiating by \(personal\). Estimate three survival models with the following independent variables: (1) personal, (2) personal and property, and (3) personal, property, and cage. Interpret the output. Is there a big difference in coefficients?