17.2 Censoring

In the case of censoring, the values of the dependent variable are reported at a certain point if they are above or below a certain value.

If all data was reported at the correct value, the following following regression model could be executed:

summary(bhat_real)
## 
## Call:
## lm(formula = yreal ~ x, data = censoring)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60012 -0.59843  0.00981  0.78105  2.05397 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.25290    0.27739  -8.122 1.44e-10 ***
## x            0.52603    0.04888  10.762 2.17e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.951 on 48 degrees of freedom
## Multiple R-squared:  0.707,  Adjusted R-squared:  0.7009 
## F-statistic: 115.8 on 1 and 48 DF,  p-value: 2.169e-14

Ignoring censoring leads to biased results:

summary(bhat_censored)
## 
## Call:
## lm(formula = y ~ x, data = censoring)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.94411 -0.51345  0.03089  0.37886  1.25169 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.54026    0.17443  -3.097  0.00326 ** 
## x            0.29534    0.03074   9.608 9.21e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5981 on 48 degrees of freedom
## Multiple R-squared:  0.6579, Adjusted R-squared:  0.6508 
## F-statistic: 92.32 on 1 and 48 DF,  p-value: 9.212e-13

Using the R package censReg) allows for the reduction of the bias:

b_correct = censReg(y~x,data=censoring)
summary(b_correct)
## 
## Call:
## censReg(formula = y ~ x, data = censoring)
## 
## Observations:
##          Total  Left-censored     Uncensored Right-censored 
##             50             17             33              0 
## 
## Coefficients:
##             Estimate Std. error t value Pr(> t)    
## (Intercept) -1.41227    0.29576  -4.775 1.8e-06 ***
## x            0.41466    0.04717   8.791 < 2e-16 ***
## logSigma    -0.33824    0.12631  -2.678 0.00741 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Newton-Raphson maximisation, 6 iterations
## Return code 1: gradient close to zero (gradtol)
## Log-likelihood: -44.496 on 3 Df