17.2 Censoring
In the case of censoring, the values of the dependent variable are reported at a certain point if they are above or below a certain value.
If all data was reported at the correct value, the following following regression model could be executed:
##
## Call:
## lm(formula = yreal ~ x, data = censoring)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2959 -0.7400 -0.3396 0.5394 2.4274
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.09074 0.31283 -6.683 2.25e-08 ***
## x 0.50987 0.05614 9.082 5.38e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.139 on 48 degrees of freedom
## Multiple R-squared: 0.6321, Adjusted R-squared: 0.6245
## F-statistic: 82.48 on 1 and 48 DF, p-value: 5.375e-12
Ignoring censoring leads to biased results:
##
## Call:
## lm(formula = y ~ x, data = censoring)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.74069 -0.52897 -0.07453 0.45831 2.07293
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5346 0.2413 -2.215 0.0315 *
## x 0.3090 0.0433 7.136 4.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8782 on 48 degrees of freedom
## Multiple R-squared: 0.5148, Adjusted R-squared: 0.5047
## F-statistic: 50.92 on 1 and 48 DF, p-value: 4.552e-09
Using the R package censReg) allows for the reduction of the bias:
##
## Call:
## censReg(formula = y ~ x, data = censoring)
##
## Observations:
## Total Left-censored Uncensored Right-censored
## 50 23 27 0
##
## Coefficients:
## Estimate Std. error t value Pr(> t)
## (Intercept) -2.22754 0.53483 -4.165 3.11e-05 ***
## x 0.52449 0.08179 6.412 1.43e-10 ***
## logSigma 0.20741 0.14219 1.459 0.145
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Newton-Raphson maximisation, 6 iterations
## Return code 1: gradient close to zero (gradtol)
## Log-likelihood: -54.35225 on 3 Df