UP

Two Way ANOVA

B Nested in A

Fixed effects approach

First we need to create a Data frame. On this dataframe we execute the ANOVA with B nested in A and not crossed. Description of the Data-set can be found in NKNW96 pag 1155.

Steps to follow:

1)
Create the data-frame on the data ( we first create a text file from a textfile  Ch28pr04.txt ; right click on the file and save on C:\Temp )
2) Create factors in the frame
3) Execute the ANOVA two-way with nesting;

Check the the graphically the predictions, residuals, QQ-plot, Cook distance plot...for every ANOVA and note your observations  

4) Balanced and fixed factors; so SS-decomposition is useful and correct; however use the book for the F-tests.

The General Linear Test or classical F tests can be done.

5) If appropiate execute multiple comparison or test specific contrast questions.

We create a data-frame by typing (or copy/paste)  the red letters after > followed by  <ENTER> The blue letters are the answer from the R-programme. We check the content by typing in the name of the data-frame. Remark systems like R originate in Unix in which they discriminate uppercase and undercase and replace the usual backslash "\" by two  "\\".

>Ch28pr04<-read.table("c:\\temp\\ch28pr04.txt")<ENTER>

The variable names are V1 to V4. We change them to meaningful names by:

>names(Ch28pr04)<-c('Rendem','Machine','Operator','Replic')   <ENTER>

>Ch28pr04<ENTER>

            Rendem Machine Operator Replic
1                   65           1              1        1
2                   58           1              1        2
3                   63           1              1        3
(rest of dataset)

>Bottles<-Ch28pr04

We can use another name e.g."Bottles" for the data-frame.. 

Although we can create factors by the factor function in the lm-formula, it is some what neater to do it now:

 Bottles$Machine<-factor(Bottles$Machine)
 
Bottles$Operator<-factor(Bottles$Operator)

We are ready with our data-preparation and can as a last step make our variables more accessible by the attach statement:

attach(Bottles)

Now in the further analysis the variables are known as e.g. Operator and the full name with object (name of data-frame) and property (name of Variable) with $ as separator like Bottles$Operator  are not necessary anymore

We are finally in business ( compare this to e.g. the datastep in SAS and others). R is a bit rough here as the volunteers of  free-ware software are more keen on the statistics. 


An useful visualisation is by calling the interaction plot function with (Factor A, Factor B , Response Var) as arguments

> interaction.plot(Machine,Operator,Rendem)<ENTER>

Parallel lines or not? Look and judge carefully. What is your opinion? Is this plot meaningful?

Another useful plot could be:

boxplot(Rendem~Machine)

What are your conclusions?


Now we can go for the Full model: two-way factor with interaction ANOVA (the data are balanced and factors are fixed so the decompostion of SSTR in SSA SSB(A) is useful and independent of type of SS):

> ResABinA<-aov(Rendem~ Machine + Machine%in%Operator); anova(ResABinA)

Analysis of Variance Table

Response: Rendem
                             Df      Sum Sq   Mean Sq    F value          Pr(>F)
Machine                  2    1695.63       847.82    35.924    2.901e-10 ***
Machine:Operator   9     2272.30      252.48    10.698    6.994e-09 ***
Residuals              48      1132.80       23.60
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

The formula for nested fixed factors has two alternative expressions: 

Rendem~ Machine + Machine%in%Operator or Rendem~ Machine + Machine/Operator

The assumptions of the error ( normal distributed, homscedascticity etc) are simply checked by plotting the different graphs:

>plot(ResABinA)<ENTER>

Use the plots to answer the the questions NKNW, 96,  problem 28.4 a and b (pag 1155).

To answer 28pr5 b one can use a modern style of plot ( better then Figure 28.3) by the lattice library.

library(lattice)
dotplot(Rendem~Machine|Operator)

Use the ANOVA table to answer problem 28.5 c, d and e. Write the H0 and Ha decision rule, and check the appropiate formulas in the book. Do not (never) trust the ANOVA table blindly.


To answer problem 28.5 f for each machine separately  we need to execute a seperate ANOVA-analysis by machine:

ResA1<-aov(Rendem~ Operator,data=subset(Bottles,Machine=="1"))
anova(ResA1)

Analysis of Variance Table

Response: Rendem
                     Df      Sum Sq      Mean Sq     F value        Pr(>F)
Operator         3       599.20        199.73     7.1653    0.002890 **
Residuals       16       446.00          27.87
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Now we take MSB(A1) and divide it by MSE from the full nested analysis:

F= 199.73 /  23.60 = 8.46 and compare to the critical F value at alfa 1%:

qf(0.99,3,48)
[1] 4.217958

We conclude Ha; differences between operators are important for machine 1.

Change 1 by 2 and 3 in above analysis and create ResA2 and ResA3.

 ResA2<-aov(Rendem~ Operator,data=subset(Bottles,Machine=="2"));anova(ResA2)
 ResA3<-aov(Rendem~ Operator,data=subset(Bottles,Machine=="2"));anova(ResA3)

Apply F*= MSB(A2)/MSE and  F*= MSB(A3)/MSE with MSE originating from the

anova(ResABinA)


Pairwise comparison of all three machines ( Tukeys multiple comparison)

TukeyHSD(ResABinA,"Machine")
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = Rendem ~ Machine + Machine %in% Operator)

$Machine
              diff                 lwr                  upr
2-1       9.75      6.034649      13.465351
3-1     12.35      8.634649      16.065351
3-2      2.60      -1.115351        6.315351

Compare to a manual approach:

a) to find the number of replicates and whether it is balanced:

tapply(Rendem,Machine:Operator,length)
1:1 1:2 1:3 1:4 2:1 2:2 2:3 2:4 3:1 3:2 3:3 3:4
   5    5    5    5   5   5    5    5    5    5   5    5

So balanced; any element out of the vector will do:

nreplic=tapply(Rendem,Machine:Operator,length)[1]

b) the number of operators and machines

nlevels(Machine)
[1] 3

nlevels(Operator)
[1] 4

c) we can extract  the MSE as an element from a matrix:

anova(ResABinA)[3,3]
23.6

d) the averages for every machine:

tapply(Rendem,Machine,mean)
       1        2        3
61.20 70.95 73.55

Calculation of standard error on estimate: (formula pag 1135 in Neter et al, 1996)

 sL=sqrt(anova(ResABinA)[3,3]/(nreplic*nlevels(Operator))*2);sL
1.536229

Tukey T value:

qtukey(0.95,nlevels(Machine),nlevels(Machine)*nlevels(Operator)*(nreplic-1))
[1] 3.420258
 T=qtukey(0.95,nlevels(Machine),nlevels(Machine)*nlevels(Operator)*(nreplic-1))/sqrt(2);T
[1] 2.418488

Critical distance is sL*T

 sL*T
3.715351


Remark:  We can also try the reduced model without operator in order to compare in the GLT approach to the full and nested model. In other words w test if we have (some) differences between operators:

> ResA<-aov(Rendem~ Machine)ENTER>

> anova(ResABinA, ResA)<ENTER>

Analysis of Variance Table

Model 1: Rendem ~ Machine + Machine:Operator
Model 2: Rendem ~ Machine
      Res.Df           RSS    Df      Sum of Sq            F              Pr(>F) 
1           48      1132.8 
2           57       3405.1     -9        -2272.3     10.698       6.994e-09 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Compare the obtained F*=10.698 with the F limit value for alfa 1%

> qf(0.99,9,48)<ENTER>
[1] 2.801816

So we can accept the Full model.


UP

02 May, 2003 by Guido Wyseure