UP

Sample size choice

Illustration

Firstly one can illustrate the principle of the power of a test.

Important remark: In R (as in S-plus and SAS) the noncentrality parameter (ncpR)  corresponds to: 

           ncpR= phiN*phiN*rlevel

 where rlevel is the number of levels,  phiN the Φ as defined in Neter et al (1996), see page 1054, equation 26.1a for the single factor case and on following pages for more factors.

 Two types of errors: type I and type II:

 

 

 

Population

Inference

 

H0

Ha

H0

correct

(1-a)

 

wrong         type II

(b)

Ha

wrong         type I

(a)

Significance

correct

(1-b

POWER

 Example to illustrate:  We can assume a situation with:

phiN=1.5
rlevel=4
nreplic=3

Therefore, following parameters can be used:

ncpR= phiN*phiN*rlevel
nT=rlevel*nreplic
v1=rlevel-1
v2=nT-rlevel

For type I error of 5%

alfa=0.05
our critical F value under the H0 is:

Fcrit=qf(1-alfa,v1,v2);Fcrit
[1] 4.066181

One can check the probability under Ha for Φ= 1.5 to get the right decision ( and to reject H0)

powerN=1-pf(Fcrit,v1,v2,ncp=ncpR);powerN
[1] 0.4870965
So not even 50% to get it right!!! Even if the H0 is not correct our chances to reject it are (too) low.

 

Full picture illustrated by the density functions (commands see later and not for Proeftechniek):

 (bold line= H0, full line Ha for Φ= 1.5 and red line the critical rejection value for alfa=5%) :

Remark that left of Fcrit under the Ha with Φ= 1.5 curve we have 51%. This is the chance to a wrong conclusion of H0 is true while Ha with Φ= 1.5 is the real population.

Exercise NKNW 26.11 page 1065

Given

a) What is the power for main effects: μ1.=6.6 ; μ2.=7.0; μ1.=7.4 ?

sigmaE=0.28
aLevels=3
bLevels=3
nReplic=4
muidot=c(6.6,7.0,7.4)

We calculate the phiN (Neter et al style according to formula 26.1a on page 1054)

phiN=1/sigmaE*sqrt(nReplic*bLevels/aLevels*sum((muidot-mean(muidot))^2))
phiN

[1] 4.04061

In the R programme we need an ncpR:

ncpR= phiN*phiN*aLevels

For alfa = 5%

alfa=0.05
v1=aLevels-1
v2=aLevels*bLevels*(nReplic-1)
Fcrit=qf(1-alfa,v1,v2);Fcrit

[1] 3.354131
Now calculate the power by R (or use table B11 on page 1356)

powerN=1-pf(Fcrit,v1,v2,ncp=ncpR);powerN
[1] 0.9999901
Answer: Power is almost 100%.

b) same but μ2.=7.3

so:

muidot=c(6.6,7.3,7.4)

and repeat relevant changed commands in R:
phiN=1/sigmaE*sqrt(nReplic*bLevels/aLevels*sum((muidot-mean(muidot))^2))
ncpR= phiN*phiN*aLevels
powerN=1-pf(Fcrit,v1,v2,ncp=ncpR);powerN

[1] 0.9999994
Degrees of freedom, critical F value all remain unchanged.

Answer: Power is almost 100%.

c) Power for main effect B for β1=-0.25,  β2=-0.25 and  β3=0.50

Again change the relevant commands:

betai=c(-0.25,-0.25,0.50)
phiN=1/sigmaE*sqrt(nReplic*aLevels/bLevels*sum((betai-mean(betai))^2))
phiN

[1] 4.374089

ncpR= phiN*phiN*bLevels
powerN=1-pf(Fcrit,v1,v2,ncp=ncpR);powerN

[1] 0.9999992

Answer: Power is almost 100%.

Excercise ch26pr11 was solved by the Power of  F test approach ( see page 1063-1057)

 

Exercise NKNW 26.22 page 1067

Rehabilitation therapy ch16pr12. (see pag 704)

Based on prevoius experience a σ = 4.5 (sigmaE ) can be assumed.

a) what would be the correct sample size for a =0.01 and a power of 80% to find differences if the population has a real  RANGE of 5.63 days?

rLevels=3
sigmaE=4.5
DeltaPop=5.63
DeltaOverSigma=DeltaPop/sigmaE;DeltaOverSigma

[1] 1.251111

See table B.12 page 1361 we find that number of replicates should be 20.

b) If we use sample size from a) for a scenario with  μ1=37 ; μ2=32; μ1=28  what will be the power?

Formula 26.1a page 1054 is applicable:

nReplic=20
mui=c(37,32,28)
phiN=1/sigmaE*sqrt(nReplic/rLevels*sum((mui-mean(mui))^2)); phiN

[1] 3.658989
In the R programme we need an ncpR:

ncpR= phiN*phiN*rLevels

v1=rLevels-1;v1
#    [1] 2
v2=nReplic*rLevels-rLevels;v2
#    [1] 57
 

Look in table B11 (page 1356) power is 1.0.

Compare to a calculation by R:

Fcrit=qf(1-alfa,v1,v2);Fcrit
[1] 3.158843
powerN=1-pf(Fcrit,v1,v2,ncp=ncpR);powerN
[1] 0.99994
Almost 100% power.

c) estimation of two pairwise comparisons L1= μ1-μ2 and L2= μ3-μ2 whereby each difference should be at least 3 days on a 95% family confidence coefficient. How many should be the sample size?

See, Neter et al (1996), page 1060 to 1063.

coefi=c(1,-1,0) # the c coefficients in formula 17.22 on page 1061
nReplic=10 # just one number of replications
sigmaLest=sqrt(sigmaE^2/nReplic*sum(coefi^2))
gBon=2
Lcrit=sigmaLest*qt(1-alfa/2/gBon,nReplic*rLevels);Lcrit

[1] 4.748528

Now we look to different sample sizes:

nReplic=c(5:40) # we will try several number of replications
# the critical L is calculated in function of the different number of replications:
Lcrit=sqrt(sigmaE^2/nReplic*sum(coefi^2))*qt(1-alfa/2/gBon,nReplic*rLevels);Lcrit

# [1] 7.086322 6.352311 5.806145 5.379636 5.034778 4.748528 4.506007 4.297129
# etc ...

plot(Lcrit~nReplic)
lines(c(5,50),c(3,3))

 

As result we get following graph:

We can see that 24 replications are appropiate.

Lcn=cbind(Lcrit,nReplic);Lcn
             
  Lcrit      nReplic
[1,]   7.086322              5
[2,]   6.352311              6
...etc...
[18,]   3.112080          22
[19,]   3.040593          23
[20,]   2.973814          24

...etc...
[36,]    2.284017          40
 

We can also see in the table that nReplic = 24 is giving the Lcrit <= 3.


Under construction ( niet voor proeftechniek)

For question a:  Alternatively we define the range as deviations from μ in order to apply formula 26.1a

nReplic=c(5:30) #we wil try a number of replications
rLevels=3
sigmaE=4.5
DeltaPop=5.63
mudiffi=c(-DeltaPop/2,+DeltaPop/2,0) # the Range in deviations from mu
alfa= 0.01
Fcrit=qf(1-alfa,rLevels-1,nReplic*rLevels-rLevels);Fcrit # the critical F value is a function of alfa =0.01 and the v1 and v2

[1] 6.926608 6.358873 6.012905 5.780416 5.613591 5.488118 5.390346 5.312029
[9] 5.247894 etc....

ncpR=(1/sigmaE*sqrt(1/rLevels*nReplic*sum((mudiffi)^2)))^2*rLevels
powerN=1-pf(qf(1-alfa,rLevels-1,nReplic*rLevels-rLevels),v1,v2,ncp=ncpR);powerN

[1] 0.0733277 0.1280291 0.1881011 0.2506188 0.3136153 0.3756674 0.4357220
[8] 0.4930094 0.5469897 0.5973125 0.6437834 0.6863345 0.7249984 0.7598849
[15] 0.7911614 0.8190357 0.8437415 0.8655273 0.8846465 0.9013504 0.9158828
[22] 0.9284759 0.9393475 0.9486998 0.9567179 0.9635702


 tabnRpN=cbind(nReplic,powerN);tabnRpN
 
         nReplic          powerN
[1,]             5      0.0733277
[2,]             6      0.1280291
etc...
[15,]         19      0.7911614
[16,]         20      0.8190357
etc...
[26,]         30      0.9635702


With 20 we get the 80% power. ( same result as by table B.12).

plot(nReplic,powerN,pch="x")
lines(c(5,50),c(0.8,0.8))

 


Niet voor proeftechniek

How to prepare the plot show in the illustration (with parameters v1, v2 and ncpR see the illustration):

we use a self-defined function with a simple numerical differentiation to calculate a non-central density function:

densityF= function(fx,v1,v2,ncp=ncpR){
+ CF1<-pf(fx,v1,v2,ncp=ncpR)
+ dx<-0.005
+ CF2<-pf(fx+dx,v1,v2,ncp=ncpR)
+ (CF2-CF1)/dx}

we define a vector of F values from 0 to 10 divide in 250 intervals:

fx=(0:250)/25
plot(fx,df(fx,v1,v2),type="l",lwd=2)

We put the Ha on the same plot by:

lines(fx,densityF(fx,v1,v2,ncp=ncpR) )

The critical F value is put on the graph as a red bold vertical line by:

 lines(c(Fcrit,Fcrit),c(0,1),lwd=3,col="red")

End result is the graph.

Cumulative graph is more useful and easy to make:

fx=(0:250)/25
powerN=1-pf(fx,v1,v2,ncp=ncpR)
plot(fx,pf(fx,v1,v2))
lines(fx,1-powerN)
lines(c(Fcrit,Fcrit),c(0,1),lwd=3,col="red")

Red line is the critical F value under H0, full line is the power for different F*, symbol line is the significance under H0.


UP

09 May, 2003 by Guido Wyseure