This document provides a few examples of Analyses of Variance for typical experimental designs. We recommend to use of the ez
package to perform ANOVAs with R.
require(ez)
## Loading required package: ez
require(lattice)
## Loading required package: lattice
require(ggplot2)
## Loading required package: ggplot2
We start by analyzing a design where the factor Subject is nested within the factor Group, that is, we want to compare independent groups.
ns <- c(20, 30, 20, 15)
means <- c(100, 110, 100, 130)
sd <- c(20, 20, 30, 25)
dat1 <- data.frame(
subj=factor(1:sum(ns)),
group=factor(rep(1:4, ns)),
y=c(rnorm(ns[1], means[1], sd[1]),
rnorm(ns[2], means[2], sd[2]),
rnorm(ns[3], means[3], sd[3]),
rnorm(ns[4], means[4], sd[4])))
Let us inspect the data:
str(dat1)
## 'data.frame': 85 obs. of 3 variables:
## $ subj : Factor w/ 85 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ group: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
## $ y : num 75.9 108.9 148.1 80.3 63.4 ...
head(dat1)
## subj group y
## 1 1 1 75.85807
## 2 2 1 108.91287
## 3 3 1 148.09366
## 4 4 1 80.28648
## 5 5 1 63.41147
## 6 6 1 111.60601
Different graphics can be used, the best choiice depends on the number of data points. For example, dotplots are good when there are not too many points:
require(lattice)
with(dat1, dotplot(y ~ group))
with(dat1, boxplot(y ~ group))
require(ggplot2)
ggplot(dat1, aes(group, y)) +
geom_violin() + geom_dotplot(binaxis='y', stackdir='center', dotsize=.5)
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
require(ez)
ezPlot(dat1, y, wid=subj, between=group, x=.(group))
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## Warning in ezStats(data = data, dv = dv, wid = wid, within = within,
## within_full = within_full, : Unbalanced groups. Mean N will be used in
## computation of FLSD
ezANOVA(dat1, dv=y, wid=subj, between=group)
## Warning: Data is unbalanced (unequal N per group). Make sure you specified
## a well-considered value for the type argument to ezANOVA().
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 1 group 3 81 3.409065 0.02139588 * 0.1121069
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 3 81 499.0771 20586.27 0.6545666 0.5823939
To conduct post-hoc tests comparing all pairs of means:
hsd <- TukeyHSD(aov(y ~ group, data=dat1))
print(hsd)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = y ~ group, data = dat1)
##
## $group
## diff lwr upr p adj
## 2-1 0.6734241 -16.952321 18.29917 0.9996365
## 3-1 7.3623076 -11.945729 26.67034 0.7496736
## 4-1 22.1939720 1.338909 43.04903 0.0324969
## 3-2 6.6888835 -10.936861 24.31463 0.7523986
## 4-2 21.5205479 2.212512 40.82858 0.0227903
## 4-3 14.8316644 -6.023398 35.68673 0.2510137
plot(hsd)
If you worry about heteroscedasticity, you can check the diagnostic plots of lm
:
par(mfcol=c(2, 2))
plot(lm(y ~ group, data=dat1))
If heteroscedasticity is severe, it will be visible on the plots of the “residuals vs. fitted” plot.
In this design, the factor ‘Subject’ is crossed with a factor ‘Treatment’, that is, there is one data point per subject and treatment.
ns <- 20
means <- c(100, 110, 100, 130)
nt <- length(means)
sd <- 20
dat2 <- data.frame(
subj=gl(ns, nt),
treat=gl(nt, 1, nt*ns))
dat2$y <- rep(means, ns) + rep(rnorm(ns, sd=20), each=nt) + rnorm(nt*ns, sd=20)
str(dat2)
## 'data.frame': 80 obs. of 3 variables:
## $ subj : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
## $ treat: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
## $ y : num 105.6 103.3 69.1 123 69.6 ...
head(dat2)
## subj treat y
## 1 1 1 105.59941
## 2 1 2 103.30880
## 3 1 3 69.06811
## 4 1 4 123.01079
## 5 2 1 69.58211
## 6 2 2 97.29196
We first plot the raw data, per subject:
require(lattice)
xyplot(y ~ treat | subj, data=dat2, type='b')