17.1 Truncation
In the case of truncation, a certain part of the data is not observed. In the graph below, the true parameters are \(\beta_0=-2\) and \(\beta_1=0.5\). Values \(y<0\) are not reported in the data. The green regression line is “correct” whereas the “red” is the line obtained from a regression model which ignores the truncation.
If all the data was observed, the correct regression model would give the following results:
##
## Call:
## lm(formula = yreal ~ x, data = truncation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0300 -0.6778 -0.1484 0.7101 2.0034
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.95643 0.27902 -7.012 7.05e-09 ***
## x 0.51658 0.05071 10.188 1.37e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.048 on 48 degrees of freedom
## Multiple R-squared: 0.6838, Adjusted R-squared: 0.6772
## F-statistic: 103.8 on 1 and 48 DF, p-value: 1.372e-13
The estimates are biased if truncation is ignored:
##
## Call:
## lm(formula = yobs ~ x, data = truncation)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1092 -0.5793 -0.2110 0.5564 1.6747
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.84279 0.62042 -1.358 0.185998
## x 0.38663 0.08905 4.342 0.000191 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9732 on 26 degrees of freedom
## (22 observations deleted due to missingness)
## Multiple R-squared: 0.4203, Adjusted R-squared: 0.398
## F-statistic: 18.85 on 1 and 26 DF, p-value: 0.0001909
To correct for the truncation, use the functions from the package truncreg which allows to reduce the bias of the coefficients:
##
## Call:
## truncreg(formula = yobs ~ x, data = truncation)
##
## BFGS maximization method
## 31 iterations, 0h:0m:0s
## g'(-H)^-1g = 2.7E-12
##
##
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -3.03940 1.51806 -2.0022 0.045267 *
## x 0.64446 0.18585 3.4676 0.000525 ***
## sigma 1.13419 0.22654 5.0066 5.541e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -32.986 on 3 Df