| UP |
Balanced
Firstly, we need to create a "data.frame". On this data.frame we execute an balanced two-way ANOVA. Description of the Data-set can be found in NKNW96.
Steps to follow:
| 1) |
Create the data-frame on the data ( we can firstly save the textfile Ch19pr14; right click on the file and save on C:\Temp ) |
| 2) | Create factors in the frame |
| 3) | Execute the ANOVA two-way with interaction;
Check the the graphically the predictions, residuals, QQ-plot, Cook distance plot...for every ANOVA and note your observations |
| 4) | Balanced and fixed factors; so SS-decomposition is useful and correct
( The General Linear Test or classical F tests can be done but ANOVA-table can be used directly). |
| 5) | If appropiate execute multiple comparison or test specific contrast questions. |
We create a data-frame by typing (or copy/paste) the red letters after > followed by <ENTER> The blue letters are the answer from the R-programme. We check the content by typing in the name of the data-frame. Remark systems like R originate in Unix in which they discriminate uppercase and undercase and replace the usual backslash "\" by two "\\".
>Ch19pr14<-read.table("c:\\temp\\ch19pr14.txt");Ch22pr14 <ENTER>
Or directly from the URL:
>Ch19pr14<-read.table("http://www.biw.kuleuven.be/vakken/statisticsbyR/datasetsTXT/CH19PR14.txt");Ch19pr14
V1 V2
V3 V4
1 2.4 1 1
1
2 2.7 1 1
2
3 2.3 1 1
3
4 2.5 1 1
4
(rest of dataset)
The variable names are V1 to V4. We change them to meaningful names by:
>names(Ch19pr14)<-c('hours','A','B','replicate'); Ch19pr14 <ENTER>
hours A
B replicate
1
2.4 1 1
1
(rest of dataset)
We almost ready with our data-preparation and can make our variables more accessible by the attach statement:
>attach(Ch19pr14)
We could create factors by the factor function in the lm-formula, but it is safer to declare them in advance:
> A<-factor(A) ; B<-factor(B)
> is.factor(A)
[1] TRUE
Now in the further analysis the variables are known as e.g.
A and B and the full name with object (name of data-frame)
and property (name of Variable) with $ as separator like
Ch19pr14$A are not necessary anymore . If we do not indicate
to R that A and B are factors we get rubbish results. Other statistical
programmes will equally go haywire as A = "1", "2" and "3" etc has no numerical
meaning.
We are finally in business ( compare this to e.g. the datastep in SAS and others). R is a bit rough here as the volunteers of free-ware software are more keen on the statistics then on the user interface....
An useful visualisation is by the command interaction.plot (Factor A, Factor B , Response Var) as arguments:
> interaction.plot(A,B,hours) <ENTER>

Parallel lines or not? Look and judge carefully. What is your opinion?
As the lines in the plot are not parallel. We can go for the Full model: two-way factor with interaction ANOVA (the data are balanced so the decompostion of SSTR in SSA SSB and SSAB is meaningful). If necessary we reduce the model.
>
ResFullMod<-lm(hours~A+B+A:B);ResFullMod

Remark: >ResFullMod<-lm(hours~A*B);ResFullMod
gives the same result.
This gives us the estimated coefficients.
We use the 'anova' function to get the ANOVA-table ( type I SS or sequential extra sum of squares):
> anova(ResFullMod)
Analysis of Variance Table
Response: hours
Df Sum Sq Mean Sq
F value
Pr(>F)
A
2 220.020
110.010 1827.86
< 2.2e-16 ***
B
2 123.660
61.830 1027.33
< 2.2e-16 ***
A:B
4 29.425
7.356 122.23
< 2.2e-16 ***
Residuals 27
1.625 0.060
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
We see that A B and A:B are significant. We accept the two-factor model with interaction. The model should satisfy the assumption on the normallity and independence of the errorterm. Therefore the residuals are checked by plotting. We can do that on one page. We subdivide 4 windows on one page organized 2 by 2 by the layout command:
> layout(matrix(c(1,2,3,4),2,2))
> plot(ResFullMod)

Are we happy with the residuals??? In other words are the assumptions on the
errors in the model correct?
One approach when interaction is important is to perform a one-way ANOVA with all combinations combined in one factor.
A simple trick (however A and B need to be used as numerics):
> C=as.numeric(A)*10+as.numeric(B)
> C
[1] 11 11 11 11 12 12 12 12 13 13 13 13 21 21 21 21 22 22
22 22 23 etc..
Be careful to use C as a factor and not as a numerical value!
> C= factor(C)
> ResCMod<-lm(hours~C);ResCMod

Compare the ANOVA table with the other table from the two way with interaction:
> anova(ResCMod)
Analysis of Variance Table
Response: hours
Df Sum Sq Mean Sq
F value Pr(>F)
C
8 373.11
46.64 774.91 <
2.2e-16 ***
Residuals 27
1.63
0.06
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
What do you see??? Look very carefully. Now we look for groups within all the combinations of A and B coded in the new combined factor C.
We can apply Tukey multiple comparisons. TukeyHSD works on a list made by aov() ) to find groups:
> TukeyHSD(aov(hours~C), "C", ordered = TRUE)
Tukey multiple comparisons of means
95% family-wise confidence level
factor levels have been ordered
Fit: aov(formula = hours ~ C)
$C
diff
lwr
upr
13-11 2.100 1.5163187
2.6836813
12-11 2.125 1.5413187
2.7086813
21-11 2.975 2.3913187
3.5586813
etc....
All combinations are compared. A lot quicker is to look to a plot of the differences:
layout(matrix(c(1),1,1))
> plot(TukeyHSD(aov(hours~C), "C", ordered = TRUE))

We can see which combinations are different according to the Tukey multiple comparisons method.
Contrasts for this case see special page on contrasts#balanced two way with interaction
| UP |
31 March, 2003 by Guido Wyseure