|
Firstly we need an object with the Data. On this Data Frame we execute the regression.
The steps are:
| 1) |
Production of the data frame containing the data |
| 2) | Perform some operations, so that the data frame contains a the information |
| 3) | Execute the frequency analysis on the data and; |
| 4) | Evaluate visually the plot. |
The dataframe can be produced in several ways. Here we read a textfile "ShahinEx1.txt" from Shahin et al (1993) into a dataframe. Right click on "ShahinEx1.txt" and save on a directory "C:\temp\". Alternative locations for saving can be used. We call it R-dataset "Flood1" and check the content before any further analysis. Double "\\" are used in the specification of the location.
>
Flood1<-read.table("c:\\temp\\ShahinEx1.txt",header=T)<ENTER>
> Flood1<ENTER>
flood
1 230
2
282
3 309
4 249
(rest van dataset)
We need to extract and produce some additional information.
As a starter in our statistical menu a boxplot never harms:
> boxplot(Flood1)<ENTER>
For calculations we better use a "pure" vector and not a dataframe:
> attach(Flood1)<ENTER>
>
flood<ENTER>
[1] 230 282 309 249 348 360 220 295 255 195 288 275 294 245 305
375 287 210 286
[20] 500 295 462 400 299 285 330 237 278 307
300
The average and standard deviation:
> Fmean<-mean(flood)<ENTER>
>
Fstdev<-sd(flood)
Remark: if we toke the average and standard deviation from a dataframe R also remembers (= stores) the name of the vector "flood" within the data.frame "Flood1". We can work with that but it becomes slightly more complicated.
From here on we do not mention <ENTER> anymore. Red is commandline input followed by <enter> and blue is the response.
> Fstdev
[1]
67.75386
> Fmean
[1] 300.0333
For our analysis we need to sort the data:
> flood2<-rev(sort(flood));flood2
[1] 500 462 400 375 360 348 330 309 307 305
300 299 295 295 294 288 287 286
[19] 285 282 278 275 255 249 245 237 230 220
210 195
We used a new name "flood2" but could have used the old one "flood". R sorts
in ascending order. Descending order is
required. We produce this by reversing the vector (function "rev").
Now we produce the ranks and store it in vector "rflood". The total number of years is stored in "N.series".
> rflood<-rev(rank(flood2)); rflood
>
N.series<-length(flood2);N.series
We no can produce the plotting position. This is a vector "T.estim" calculated from another vector "rflood" and with a scalar "N.series".
> T.estim<-(N.series+0.12)/(rflood-0.44); T.estim
[1] 53.785714 19.307692 11.765625 8.460674
6.605263 5.417266 4.591463
[8] 3.984127 and more...
We are ready for the frequency factors:
> KT<-(-sqrt(6)/pi*(0.5772
+log(log(T.estim/(T.estim-1) ) ) ) ); KT
[1] 2.64975196 1.83761304
1.43768206 1.16642365 0.95880018 0.78912391 0.64456920
[8] 0.51778933
0.40416984 and more...
The theoretical values based on their average and standard deviation are:
> flood.estim<-Fmean+KT*Fstdev;flood.estim
The
estimated "flood.estim" should compare well with the observed and sorted ones
"flood". We can judge them best in a plot which is linear for the given
distribution.
> plot(flood2~KT)
> lines(flood.estim~KT)
>
points(flood.estim~KT,pch="*")
The observed flows are shown with "O" ; the estimated ones by a line. Although not recommended the estimated are also plotted with "*". Shows a line with "*".

In this simple example the strength of R is not yet obvious. When comparing models and with more advanced analysis the object-oriented approach of R and S-plus quickly shows its value.
Further analysis is easy. A standard procedure is the Kolmogorov-Smirnov test. Two vectors of data-values are compared to see if they come from the same distribution.
> ks.test(flood2,flood.estim)
Similar the chi-square test to compare vectors.
> chisq.test(flood2,flood.estim)
Standard errors of estimates according to Shahin et al, 1993:
> deltaT<-sqrt(1+1.14*KT+1.1*KT^2)
>
sdT<-deltaT*Fstdev/sqrt(N.series);sdT
>
tvalue<-qt(0.975,df=N.series);tvalue
Upper and under confidence limits:
> flood.up<-flood.estim+tvalue*sdT ;
flood.low<-flood.estim-tvalue*sdT;
lines(flood.low~KT,col="blue",lwd=2);
lines(flood.up~KT,col="red",lwd=3)
The lines command adds the lines on the existing plot:

More advanced R (not part of PROMAS course):
More efficient would be to define a function ( a library for frequency analysis in hydrology could be made.....)
> KTfun<-function(T.extrapol){
(-sqrt(6)/pi*(0.5772 +log(log(T.extrapol/(T.extrapol-1) ) ) ) )
}
The function can be used to calculate the frequency factor:
> KT<-KTfun(T.estim)
and used again to extrapolate to an average return period of 50, 75 and 100 years:
> KTextra<-KTfun(c(50,75,100)); KTextra
[1] 2.592288 2.911064
3.136681
> flood.extra<-Fmean+KTextra*Fstdev; flood.extra
[1] 475.6708 497.2691
512.5555
Now we wish to plot everything in one graph:
> sdtfun<-function(KT.extra){
deltaT<-sqrt(1+1.14*KT.extra+1.1*KT.extra^2)
sdT<-deltaT*Fstdev/sqrt(N.series)
}
>
sdt2<-sdtfun(KText)
Again we make a function; slightly more work but rewarding for any later analysis.
All the calculated values and and upper and lower estimate:
> flood.calc<-Fmean+KText*Fstdev;
flood.low<-flood.calc-tvalue*sdt2;
flood.up<-flood.calc+tvalue*sdt2
We first plot the largest values ( flood.up) and then add the rest
> plot(flood.up~KText,lwd=3,col="red");
lines(flood.calc~KText);
points(flood.low~KText,lwd=2,col="blue");
points(floodall~KText,pch="*");
points(flood2~KT,pch="O",col="green")
One of the strenghts of R is that such an analysis can be automated by
programming a few functions and procedures.
| Up |
19 April 2006 by Guido Wyseure