| UP |
First we need to create a Data frame. On this dataframe we execute the ANOVA with B nested in A and not crossed. Description of the Data-set can be found in NKNW96 pag 1155.
Steps to follow:
| 1) |
Create the data-frame on the data ( we first create a text file from a textfile Ch28pr04.txt ; right click on the file and save on C:\Temp ) |
| 2) | Create factors in the frame |
| 3) | Execute the ANOVA two-way with nesting;
Check the the graphically the predictions, residuals, QQ-plot, Cook distance plot...for every ANOVA and note your observations |
| 4) | Balanced and fixed factors; so SS-decomposition is useful and correct;
however use the book for the F-tests. The General Linear Test or classical F tests can be done. |
| 5) | If appropiate execute multiple comparison or test specific contrast questions. |
We create a data-frame by typing (or copy/paste) the red letters after > followed by <ENTER> The blue letters are the answer from the R-programme. We check the content by typing in the name of the data-frame. Remark systems like R originate in Unix in which they discriminate uppercase and undercase and replace the usual backslash "\" by two "\\".
>Ch28pr04<-read.table("c:\\temp\\ch28pr04.txt")<ENTER>
The variable names are V1 to V4. We change them to meaningful names by:
>names(Ch28pr04)<-c('Rendem','Machine','Operator','Replic') <ENTER>
>Ch28pr04<ENTER>
Rendem Machine
Operator
Replic
1
65
1
1
1
2
58
1
1
2
3
63
1
1 3
(rest of dataset)
>Bottles<-Ch28pr04
We can use another name e.g."Bottles" for the data-frame..
Although we can create factors by the factor function in the lm-formula, it is some what neater to do it now:
Bottles$Machine<-factor(Bottles$Machine)
Bottles$Operator<-factor(Bottles$Operator)
We are ready with our data-preparation and can as a last step make our variables more accessible by the attach statement:
attach(Bottles)
Now in the further analysis the variables are known as e.g. Operator and the full name with object (name of data-frame) and property (name of Variable) with $ as separator like Bottles$Operator are not necessary anymore
We are finally in business ( compare this to e.g. the datastep in SAS and others). R is a bit rough here as the volunteers of free-ware software are more keen on the statistics.
An useful visualisation is by calling the interaction plot function with (Factor A, Factor B , Response Var) as arguments
> interaction.plot(Machine,Operator,Rendem)<ENTER>
Parallel lines or not? Look and judge carefully. What is your opinion? Is this plot meaningful?
Another useful plot could be:
boxplot(Rendem~Machine)
What are your conclusions?
Now we can go for the Full model: two-way factor with interaction ANOVA (the data are balanced and factors are fixed so the decompostion of SSTR in SSA SSB(A) is useful and independent of type of SS):
> ResABinA<-aov(Rendem~ Machine + Machine%in%Operator); anova(ResABinA)
Analysis of Variance Table
Response: Rendem
Df Sum Sq Mean Sq F
value Pr(>F)
Machine
2 1695.63 847.82
35.924 2.901e-10 ***
Machine:Operator 9 2272.30
252.48 10.698 6.994e-09 ***
Residuals
48 1132.80
23.60
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
The formula for nested fixed factors has two alternative expressions:
Rendem~ Machine + Machine%in%Operator or Rendem~ Machine + Machine/Operator
The assumptions of the error ( normal distributed, homscedascticity etc) are simply checked by plotting the different graphs:
>plot(ResABinA)<ENTER>
Use the plots to answer the the questions NKNW, 96, problem 28.4 a and b (pag 1155).
To answer 28pr5 b one can use a modern style of plot ( better then Figure 28.3) by the lattice library.
library(lattice)
dotplot(Rendem~Machine|Operator)

Use the ANOVA table to answer problem 28.5 c, d and e. Write the H0 and Ha decision rule, and check the appropiate formulas in the book. Do not (never) trust the ANOVA table blindly.
To answer problem 28.5 f for each machine separately we need to execute a seperate ANOVA-analysis by machine:
ResA1<-aov(Rendem~
Operator,data=subset(Bottles,Machine=="1"))
anova(ResA1)
Analysis of Variance Table
Response: Rendem
Df Sum Sq Mean Sq
F value Pr(>F)
Operator 3
599.20 199.73
7.1653 0.002890 **
Residuals 16
446.00 27.87
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Now we take MSB(A1) and divide it by MSE from the full nested analysis:
F= 199.73 / 23.60 = 8.46 and compare to the critical F value at alfa 1%:
qf(0.99,3,48)
[1] 4.217958
We conclude Ha; differences between operators are important for machine 1.
Change 1 by 2 and 3 in above analysis and create ResA2 and ResA3.
ResA2<-aov(Rendem~
Operator,data=subset(Bottles,Machine=="2"));anova(ResA2)
ResA3<-aov(Rendem~
Operator,data=subset(Bottles,Machine=="2"));anova(ResA3)
Apply F*= MSB(A2)/MSE and F*= MSB(A3)/MSE with MSE originating from
the
anova(ResABinA)
Pairwise comparison of all three machines ( Tukeys multiple comparison)
TukeyHSD(ResABinA,"Machine")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Rendem ~ Machine + Machine %in% Operator)
$Machine
diff
lwr
upr
2-1 9.75
6.034649 13.465351
3-1 12.35 8.634649
16.065351
3-2 2.60 -1.115351
6.315351
Compare to a manual approach:
a) to find the number of replicates and whether it is balanced:
tapply(Rendem,Machine:Operator,length)
1:1 1:2 1:3 1:4 2:1 2:2 2:3 2:4 3:1 3:2 3:3 3:4
5 5 5 5
5 5 5 5 5
5 5 5
So balanced; any element out of the vector will do:
nreplic=tapply(Rendem,Machine:Operator,length)[1]
b) the number of operators and machines
nlevels(Machine)
[1] 3
nlevels(Operator)
[1] 4
c) we can extract the MSE as an element from a matrix:
anova(ResABinA)[3,3]
23.6
d) the averages for every machine:
tapply(Rendem,Machine,mean)
1
2 3
61.20 70.95 73.55
Calculation of standard error on estimate: (formula pag 1135 in Neter et al, 1996)
sL=sqrt(anova(ResABinA)[3,3]/(nreplic*nlevels(Operator))*2);sL
1.536229
Tukey T value:
qtukey(0.95,nlevels(Machine),nlevels(Machine)*nlevels(Operator)*(nreplic-1))
[1] 3.420258
T=qtukey(0.95,nlevels(Machine),nlevels(Machine)*nlevels(Operator)*(nreplic-1))/sqrt(2);T
[1] 2.418488
Critical distance is sL*T
sL*T
3.715351
Remark: We can also try the reduced model without operator in order to compare in the GLT approach to the full and nested model. In other words w test if we have (some) differences between operators:
> ResA<-aov(Rendem~ Machine)ENTER>
> anova(ResABinA, ResA)<ENTER>
Analysis of
Variance Table
Model 1: Rendem ~ Machine + Machine:Operator
Model 2:
Rendem ~ Machine
Res.Df
RSS Df Sum of
Sq
F
Pr(>F)
1
48
1132.8
2
57 3405.1
-9 -2272.3
10.698 6.994e-09 ***
---
Signif.
codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Compare the obtained F*=10.698 with the F limit value for alfa 1%
> qf(0.99,9,48)<ENTER>
[1]
2.801816
So we can accept the Full model.
| UP |
02 May, 2003 by Guido Wyseure