| UP |
Firstly, we need to create a Data frame. On this dataframe we execute the ANCOVA. Description of the Data-set can be found in NKNW96 pag 1034.
Steps to follow:
| 1) |
Create the data-frame on the data ( we first create a text file from a textfile Ch25pr07.txt ; right click on the file and save on C:\Temp ) |
| 2) | Create factors in the data-frame |
| 3) | Execute the ANCOVA
Check the the graphically the predictions, residuals, QQ-plot, Cook distance plot...for every analysis and note your observations |
| 4) | We execute the General Linear Test in order to select the best model. |
| 5) | If appropiate execute multiple comparison or test specific contrast questions. |
We create a data-frame by typing (or copy/paste) the red letters after > followed by <ENTER> The blue letters are the answer from the R-programme. We check the content by typing in the name of the data-frame. Remark systems like R originate in Unix in which they discriminate uppercase and undercase and replace the usual backslash "\" by two "\\".
Ch25pr07<-read.table("c:\\temp\\ch25pr07.txt")<ENTER>
Ch25pr07<ENTER>
V1 V2
V3 V4
1
7.6 1 1
8.2
2 8.2
1 2 7.9
3
6.8 1 3 7.0
(rest of dataset)
The variable names are V1 to V4. We change them to meaningful names by:
names(Ch25pr07)<-c('Improv','RDexp','Replic','PriorImp'); Ch25pr07
Improv RDexp
Replic
PriorImp
1
7.6
1
1
8.2
2
8.2
1
2
7.9
3
6.8
1
3 7.0
(rest of dataset)
0ur data are in a data-frame and we can make
our variables more
accessible by the attach statement:
> attach(Ch25pr07)
Now in the further analysis the variables are known as e.g. Improv and the full name with object (name of data-frame) and property (name of Variable) with $ as separator like Ch25pr07$Improv are not necessary anymore
Although we can create factors by the factor function in the lm-formula, it is some what neater to do it now and protects us against mistakes:
> RDexp<-factor(RDexp)
We are finally in business ( compare this to e.g. the datastep in SAS and others).
We check on the number of replicates:
> tapply(Improv,RDexp,length)
1
2 3
9 12 6
So, unbalanced
Before the analysis a visualisation is important;
Alternative 1:
> coplot(Improv~PriorImp|RDexp) <ENTER>
This gives a rather complicated plot; but gives some idea. What is your opinion?
Alternative 2 is just to plot after which we add a label by the text function (not elegant but it works; I added 0.15 to offset the label).
> plot(Improv~PriorImp)
text(PriorImp+0.15, Improv, RDexp)<ENTER>
Alternative 3 uses a small dirty trick to get different symbols depending on the factor: we use symbols "pch= " according to the numerical value of the levels in the factor RDexp. The numerical value of the level is 1, 2 , 3 etc...
> plot(Improv~PriorImp,pch=as.numeric(RDexp))<ENTER>
In your opinion do we have different slopes and/or different
intercepts? Last graph is the most useful.
One very important trap is that RDexp is a factor in this dataset presented as the (numeric) values 1, 2 and 3. We have to make sure this variable is taken as a factor!!! Therefore; RDexp=factor(RDexp) is esssential and was executed earlier.
Now we can go for the Full model and compare it to reduced models:
> ResF<-lm(Improv~ RDexp+ PriorImp + RDexp:PriorImp)<ENTER>
>
summary(ResF)<ENTER>
The assumptions of the error ( normal distributed, homscedascticity etc) are simply checked by plotting the different graphs:
>plot(ResF)<ENTER>
We can also try different reduced models and give each a name:
ResR1<-lm(Improv~ RDexp+ PriorImp)
ResSR<-lm(Improv~ PriorImp)
The object-oriented approach is for beginners cumbersome and does not follow a userfriendly "click and the see a lot of output" pattern. In fact output on every click can be both intimidating and potentially misleading. The object-oriented method can now fully show its value. Every model evaluation is stored in the objects ResFull, ResSR, ResR1. We can then use the objects in further testing. The famous General Linear Test (GLT) to test a Full against a Reduced model is very simply:
> anova(ResF, ResR1)<ENTER>
Or even in one go test three models :
>
anova(ResSR,ResR1,ResF)<ENTER>
Analysis of Variance
Table
Model 1: Improv ~ PriorImp
Model 2: Improv ~ RDexp +
PriorImp
Model 3: Improv ~ RDexp + PriorImp +
RDexp:PriorImp
Res.Df
RSS Df Sum of
Sq
F
Pr(>F)
1
25
5.5134
2
23
1.3175
2
4.1958
46.027 2.108e-08
***
3
21
0.9572
2
0.3604
3.953
0.03491 *
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
0.1 ` '1
We can conclude the Full model ( slope and intercept are different). We have tested three models "different slopes and intercepts at least one of the factors (ResFull)" ; "different intercepts at least one of the factors but the same slopes for all factors (ResR1)" and "same slope and intercept for all factors (ResSR)".
However, as can be seen from the plot of the data; the slopes are different at a 3.5% level, so less than the 5% level but not in a very convincing way.
This is in contrast to the extremely small probability for the same intercept. There is little doubt in any practical terms that the intercepts are different.
The indicator variables can be produced in the Neter et al (1996) (pag 1018) style:
xij=PriorImp-mean(PriorImp)
ResF<-lm(Improv~ RDexp+ xij + RDexp:xij,contrasts=list(RDexp="contr.sum"))
XMat= model.matrix(ResF); XMat
(Intercept) RDexp1
RDexp2
xij RDexp1:xij
RDexp2:xij
1
1 1
0 -1.200000e+00
-1.2 0.000000e+00
2
1 1
0 -1.500000e+00
-1.5 0.000000e+00
3
1 1
0 -2.400000e+00
-2.4 0.000000e+00
4
1 1
0 -3.700000e+00
-3.7 0.000000e+00
and so on...
Multiple comparisons and contrasts is not covered in Neter et al (1996) chapter 25.
Exercise 2 with data on the calibaration of a capacitance-probe. Adjust the example above.
| UP |
31 March, 2003 by Guido Wyseure