UP

ANCOVA

Or how to compare regressions

Firstly, we need to create a Data frame. On this dataframe we execute the ANCOVA. Description of the Data-set can be found in NKNW96 pag 1034.

Steps to follow:

1)
Create the data-frame on the data ( we first create a text file from a textfile  Ch25pr07.txt ; right click on the file and save on C:\Temp )
2) Create factors in the data-frame
3) Execute the ANCOVA 

Check the the graphically the predictions, residuals, QQ-plot, Cook distance plot...for every analysis and note your observations  

4) We execute the General Linear Test in order to select the best model.
5) If appropiate execute multiple comparison or test specific contrast questions.

We create a data-frame by typing (or copy/paste)  the red letters after > followed by  <ENTER> The blue letters are the answer from the R-programme. We check the content by typing in the name of the data-frame. Remark systems like R originate in Unix in which they discriminate uppercase and undercase and replace the usual backslash "\" by two  "\\".

Ch25pr07<-read.table("c:\\temp\\ch25pr07.txt")<ENTER>

Ch25pr07<ENTER>

         V1 V2 V3    V4
1       7.6    1    1   8.2
2       8.2    1    2   7.9
3       6.8    1    3   7.0
(rest of dataset)

The variable names are V1 to V4. We change them to meaningful names by:

names(Ch25pr07)<-c('Improv','RDexp','Replic','PriorImp'); Ch25pr07

            Improv RDexp Replic PriorImp
1                7.6         1         1          8.2
2                8.2         1         2          7.9
3                6.8         1         3          7.0

(rest of dataset)


0ur data are in a data-frame and we can make our variables more accessible by the attach statement:

> attach(Ch25pr07)

Now in the further analysis the variables are known as e.g. Improv and the full name with object (name of data-frame) and property (name of Variable) with $ as separator like Ch25pr07$Improv  are not necessary anymore

Although we can create factors by the factor function in the lm-formula, it is some what neater to do it now and protects us against mistakes:

> RDexp<-factor(RDexp)

We are finally in business ( compare this to e.g. the datastep in SAS and others). 

We check on the number of replicates:

> tapply(Improv,RDexp,length)
      1      2     3
      9    12     6

So, unbalanced


Before the analysis a visualisation is important;

Alternative 1:

> coplot(Improv~PriorImp|RDexp) <ENTER>

This gives a rather complicated plot; but gives some idea. What is your opinion?

Alternative 2 is just to plot after which we add a label by the text function (not elegant but it works; I added 0.15 to offset the label).

> plot(Improv~PriorImp)

 text(PriorImp+0.15, Improv, RDexp)<ENTER>

Alternative 3 uses a small dirty trick to get different symbols depending on the factor: we use symbols "pch= " according to the numerical value of the levels in the factor RDexp. The numerical value of the level is 1, 2 , 3 etc...

> plot(Improv~PriorImp,pch=as.numeric(RDexp))<ENTER>

In your opinion do we have different slopes and/or different intercepts? Last graph is the most useful.


One very important trap is that RDexp is a factor in this dataset presented as the (numeric) values 1, 2 and 3. We have to make sure this variable is taken as a factor!!! Therefore;  RDexp=factor(RDexp) is esssential and was executed earlier.

Now we can go for the Full model and compare it to reduced models:

> ResF<-lm(Improv~ RDexp+ PriorImp + RDexp:PriorImp)<ENTER>
> summary(ResF)
<ENTER>

The assumptions of the error ( normal distributed, homscedascticity etc) are simply checked by plotting the different graphs:

>plot(ResF)<ENTER>


We can also try different reduced models and give each a name:

ResR1<-lm(Improv~ RDexp+ PriorImp)
ResSR<-lm(Improv~  PriorImp)

The  object-oriented approach is for beginners cumbersome and does not follow a userfriendly  "click and the see a lot of output" pattern. In fact output on every click can be both intimidating and potentially misleading. The object-oriented  method can now fully show its value. Every model evaluation is stored in the objects ResFull, ResSR, ResR1. We can then use the objects in further testing. The famous General Linear Test (GLT) to test a Full against a Reduced model is very simply:

> anova(ResF, ResR1)<ENTER>

Or even in one go test three models :
> anova(ResSR,ResR1,ResF)<ENTER>
Analysis of Variance Table

Model 1: Improv ~ PriorImp
Model 2: Improv ~ RDexp + PriorImp
Model 3: Improv ~ RDexp + PriorImp + RDexp:PriorImp
            Res.Df            RSS     Df       Sum of Sq                  F                 Pr(>F) 
         1       25        5.5134 
         2       23        1.3175         2           4.1958          46.027           2.108e-08 ***
         3       21        0.9572         2           0.3604           3.953                0.03491 * 
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '1 

We can conclude the Full model ( slope and intercept are different).  We have tested three models  "different slopes and intercepts at least one of the factors (ResFull)" ;  "different intercepts at least one of the factors but the same slopes for all factors (ResR1)" and "same slope and intercept for all  factors (ResSR)".

However, as can be seen from the plot of the data; the slopes are different at a 3.5% level, so less than the 5% level but not in a very convincing way.

This is in contrast to the extremely small probability for the same intercept. There is little doubt in any practical terms that the intercepts are different.

 


The indicator variables can  be produced in the Neter et al (1996) (pag 1018)  style:

xij=PriorImp-mean(PriorImp)
ResF<-lm(Improv~ RDexp+ xij + RDexp:xij,contrasts=list(RDexp="contr.sum"))
XMat= model.matrix(ResF); XMat

       (Intercept) RDexp1 RDexp2                     xij           RDexp1:xij           RDexp2:xij
  1                 1            1           0       -1.200000e+00               -1.2      0.000000e+00
  2                 1            1           0       -1.500000e+00               -1.5      0.000000e+00
  3                 1            1           0       -2.400000e+00               -2.4      0.000000e+00
  4                 1            1           0       -3.700000e+00               -3.7      0.000000e+00

and so on...


Multiple comparisons and contrasts is not covered in Neter et al (1996) chapter 25.

Exercise 2 with data on the calibaration of a capacitance-probe. Adjust the example above.


 

UP

31 March, 2003 by Guido Wyseure