17.2 Censoring
In the case of censoring, the values of the dependent variable are reported at a certain point if they are above or below a certain value.
If all data was reported at the correct value, the following following regression model could be executed:
##
## Call:
## lm(formula = yreal ~ x, data = censoring)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3707 -0.8230 -0.1525 0.7057 3.1032
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.79575 0.31045 -5.784 5.35e-07 ***
## x 0.44893 0.05551 8.088 1.62e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.176 on 48 degrees of freedom
## Multiple R-squared: 0.5768, Adjusted R-squared: 0.568
## F-statistic: 65.42 on 1 and 48 DF, p-value: 1.622e-10
Ignoring censoring leads to biased results:
##
## Call:
## lm(formula = y ~ x, data = censoring)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.50895 -0.55421 -0.09853 0.28767 2.43058
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.35947 0.23417 -1.535 0.131
## x 0.26499 0.04187 6.329 7.85e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8869 on 48 degrees of freedom
## Multiple R-squared: 0.4549, Adjusted R-squared: 0.4436
## F-statistic: 40.06 on 1 and 48 DF, p-value: 7.853e-08
Using the R package censReg) allows for the reduction of the bias:
##
## Call:
## censReg(formula = y ~ x, data = censoring)
##
## Observations:
## Total Left-censored Uncensored Right-censored
## 50 23 27 0
##
## Coefficients:
## Estimate Std. error t value Pr(> t)
## (Intercept) -2.01919 0.53095 -3.803 0.000143 ***
## x 0.47381 0.08049 5.887 3.94e-09 ***
## logSigma 0.23227 0.14264 1.628 0.103432
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Newton-Raphson maximisation, 6 iterations
## Return code 1: gradient close to zero (gradtol)
## Log-likelihood: -54.88586 on 3 Df