| UP |
Firstly, we need to create a Data frame. On this dataframe we execute the ANOVA. The dataset is the example from NKWN; 1996 Table 17.4 pag 729.
Steps to follow:
| 1) |
Create the data-frame on the data (textfile Table17.2 ; right click on the file and save on C:\Temp ) |
| 2) | Create factors in the data-frame |
| 3) | Execute the ANOVA
Check the the graphically the predictions, residuals, QQ-plot, Cook distance plot...for every analysis and note your observations |
| 4) | If appropiate execute multiple comparison or test specific contrast questions. |
We create a data-frame by typing (or copy/paste) the red letters after > followed by <ENTER> The blue letters are the answer from the R-programme. We check the content by typing in the name of the data-frame. Remark systems like R originate in Unix in which they discriminate uppercase and undercase and replace the usual backslash "\" by two "\\".
RustDF<-read.table("c://temp//ch17ta02.txt")
Alternative approach is to open directly from the WWW-server:
RustDF<-read.table("http://www.biw.kuleuven.be/vakken/statisticsbyR/datasetsTXT/CH17TA02.txt")
RustDF
V1 V2 V3
1 43.9
1 1
2 39.0
1 2
3 46.7
1 3
4 43.8
1 4
5 44.2
1 5
(rest of dataset)
We can give names to the variables:
names(RustDF)=c("Rust","Brand","Replication")
We can make our variables more accessible by the attach statement:
> attach(RustDF)<ENTER>
Although we can create factors by the factor function in the lm-formula, it is some what neater to do it now and will protect us against mistakes:
> Brand=factor(Brand)
> is.factor(Brand)
[1] TRUE
Before the analysis a visualisation is useful and important.
A boxplot is a popular way of comparing groups:.
> boxplot(Rust ~ Brand)
With as result:
It appears from the boxplot that differences appear to be important. However,
we wish to know if this is significant and which groups are significantly
different.
Now, we can execute the ANOVA (single factor). Remark that we use the aov() and not lm(). aov uses lm for specific ANOVA analysis including the Tukey multiple comparison.
Is there (somewhere) a difference between Brands ?
> summary(fm1 <- aov(Rust ~ Brand))
Df Sum Sq Mean Sq
F value Pr(>F)
Brand 3
15953.5 5317.8
866.12 < 2.2e-16 ***
Residuals 36 221.0
6.1
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Conclusion of the F-test: there is somewhere a difference. Next question
is: where are the differences?
> plot(fm1)
This plot controls the basis assumptions for the classic ANOVA model with
normally distributed, independent and constant error.
Homoscedasticity ( variance of the error is constant) is very important (i.e. error in the model is constant and
independent of the factor levels). The QQplot checks the normal distributions.
Outliers can be detected in the Cooks' distance plot.
If we are not satisfied, we should consider transformations, removal of suspect outliers etc...
Multiple comparisons by TukeyHSD (must be done on a list made by aov() ).
(or the classical Tukey method: family of all pairwise comparisons)
Avoid looking for differences if the ANOVA null-hypothesis is accepted. This means that the ANOVA-test indicates that all averages are equal and there is no difference. However it could be possible that within a lot of groups just one group is different from all the others... and the null-hypothesis is still accepted. A Boxplot will often help you to recognize such dangerous cases. Remark that with many groups (e.g. 20) a wrong pairwise t-test will very likely (at least one if 5% alfa) show differences between the lowest and the highest average.
The Tukey multiple comparison procedure can by the function TukeyHSD. This function is available in the base of R.
> fm1Tukey=TukeyHSD(fm1,"Brand") ; fm1Tukey
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Rust ~ Brand)
$Brand
diff
lwr
upr
2-1 46.30
43.315536 49.2844635
3-1 24.81
21.825536 27.7944635
4-1 -2.67
-5.654464 0.3144635
3-2 -21.49
-24.474464 -18.5055365
4-2 -48.97
-51.954464 -45.9855365
4-3 -27.48
-30.464464 -24.4955365
Differences between Brands are significant at 5% level if the confidence
interval around the estimation of the difference does not contain zero.
This can be visualised by a plot of the list.
> plot(fm1Tukey)

We see that all brands have different levels with the exception of th brand 1 as compared to brand 4.
Multiple comparisons by three major methods proposed by NKNW 96
Contrasts for one way ANOVA see contrasts#one way ANOVA
| UP |
31 March, 2003 by Guido Wyseure