UP

Two Way ANOVA

Balanced

Firstly,  we need to create a "data.frame". On this data.frame we execute an balanced two-way ANOVA.  Description of the Data-set can be found in NKNW96.

Steps to follow:

1)
Create the data-frame on the data ( we can firstly save the textfile  Ch19pr14; right click on the file and save on C:\Temp )
2) Create factors in the frame
3) Execute the ANOVA two-way with interaction;

Check the the graphically the predictions, residuals, QQ-plot, Cook distance plot...for every ANOVA and note your observations  

4) Balanced and fixed factors; so SS-decomposition is useful and correct

( The General Linear Test or classical F tests can be done but ANOVA-table can be used directly).

5) If appropiate execute multiple comparison or test specific contrast questions.

We create a data-frame by typing (or copy/paste)  the red letters after > followed by  <ENTER> The blue letters are the answer from the R-programme. We check the content by typing in the name of the data-frame. Remark systems like R originate in Unix in which they discriminate uppercase and undercase and replace the usual backslash "\" by two  "\\".


>Ch19pr14<-read.table("c:\\temp\\ch19pr14.txt");Ch22pr14
<ENTER>

Or directly from the URL:

>Ch19pr14<-read.table("http://www.biw.kuleuven.be/vakken/statisticsbyR/datasetsTXT/CH19PR14.txt");Ch19pr14

       V1   V2 V3 V4
1     2.4     1    1    1
2     2.7     1    1    2
3     2.3     1    1    3
4     2.5     1    1    4
(rest of dataset)

The variable names are V1 to V4. We change them to meaningful names by:

>names(Ch19pr14)<-c('hours','A','B','replicate'); Ch19pr14  <ENTER>

              hours    A    B    replicate
1               2.4     1    1               1
(rest of dataset)

We almost  ready with our data-preparation and can make our variables more accessible by the attach statement:

>attach(Ch19pr14)

We could create factors by the factor function in the lm-formula, but it is safer to declare them in advance:

> A<-factor(A) ; B<-factor(B)
> is.factor(A)
[1] TRUE
Now in the further analysis the variables are known as e.g. A and B and the full name with object (name of data-frame) and property (name of Variable) with $ as separator like Ch19pr14$A  are not necessary anymore . If we do not indicate to R that A and B are factors we get rubbish results. Other statistical programmes will equally go haywire as A = "1", "2" and "3" etc has no numerical meaning.

We are finally in business ( compare this to e.g. the datastep in SAS and others). R is a bit rough here as the volunteers of  free-ware software are more keen on the statistics then on the user interface.... 


An useful visualisation is by the command interaction.plot (Factor A, Factor B , Response Var) as arguments:

>  interaction.plot(A,B,hours) <ENTER>

Parallel lines or not? Look and judge carefully. What is your opinion?


As the lines in the plot are not parallel. We can go for the Full model: two-way factor with interaction ANOVA (the data are balanced so the decompostion of SSTR in SSA SSB and SSAB is meaningful). If necessary we reduce the model.

> ResFullMod<-lm(hours~A+B+A:B);ResFullMod


Remark:   >ResFullMod<-lm(hours~A*B);ResFullMod   gives the same result.
This gives us the estimated coefficients.

We use the 'anova' function to get the ANOVA-table ( type I SS or sequential extra sum of squares):

> anova(ResFullMod)
Analysis of Variance Table

Response: hours
                   Df     Sum Sq      Mean Sq        F value              Pr(>F)
A                  2      220.020       110.010     1827.86         < 2.2e-16 ***
B                  2      123.660         61.830     1027.33         < 2.2e-16 ***
A:B              4        29.425           7.356       122.23          < 2.2e-16 ***
Residuals    27          1.625           0.060
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

 

We see that A B and A:B are significant. We accept the two-factor model with interaction. The model should satisfy the assumption on the normallity and independence of the errorterm. Therefore the residuals are checked by plotting. We can do that on one page. We subdivide 4 windows on one page organized 2 by  2 by the layout command:

> layout(matrix(c(1,2,3,4),2,2))
> plot(ResFullMod)


Are we happy with the residuals??? In other words are the assumptions on the errors in the model correct?

One approach when interaction is important is to perform a one-way ANOVA with all combinations combined in one factor.

A simple trick (however A and B need to be used as numerics):

 > C=as.numeric(A)*10+as.numeric(B)
>  C

[1] 11 11 11 11 12 12 12 12 13 13 13 13 21 21 21 21 22 22 22 22 23 etc..

Be careful to use C as a factor and not as a numerical value!

> C= factor(C)
> ResCMod<-lm(hours~C);ResCMod


Compare the ANOVA table with the other table from the two way with interaction:

> anova(ResCMod)
Analysis of Variance Table

Response: hours
                  Df    Sum Sq       Mean Sq        F value          Pr(>F)
C                 8      373.11           46.64         774.91    < 2.2e-16 ***
Residuals   27          1.63             0.06
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

What do you see??? Look very carefully.  Now we look for groups within all the combinations of A and B coded in the new combined factor C.


We can apply Tukey multiple comparisons. TukeyHSD works on a list made by aov() ) to find groups:

> TukeyHSD(aov(hours~C), "C", ordered = TRUE)
Tukey multiple comparisons of means
95% family-wise confidence level
factor levels have been ordered

Fit: aov(formula = hours ~ C)

$C
                diff                  lwr                  upr
13-11    2.100    1.5163187      2.6836813
12-11    2.125    1.5413187      2.7086813
21-11    2.975    2.3913187      3.5586813 

etc....

All combinations are compared. A lot quicker is to look to a plot of the differences:

 layout(matrix(c(1),1,1))
> plot(TukeyHSD(aov(hours~C), "C", ordered = TRUE))

We can see which combinations are different according to the Tukey multiple comparisons method.


Contrasts for this case see special page on contrasts#balanced two way with interaction


UP

31 March, 2003 by Guido Wyseure