Up

Gumbel analysis

Firstly we need an object with the Data. On this Data Frame we execute the regression. 

The steps are:

1)
Production of the data frame containing the data
2) Perform some operations, so that the data frame contains a the information
3) Execute the frequency analysis on the data and;
4) Evaluate visually the plot.

The dataframe can be produced in several ways. Here we read a textfile "ShahinEx1.txt" from Shahin et al (1993) into a dataframe. Right click on "ShahinEx1.txt" and save on a directory  "C:\temp\".  Alternative locations for saving can be used. We call it  R-dataset "Flood1" and check the content before any further analysis. Double "\\" are used in the specification of the location.

> Flood1<-read.table("c:\\temp\\ShahinEx1.txt",header=T)<ENTER>
>  Flood1<ENTER>
   
flood
1   230
2   282
3   309
4   249
(rest van dataset)

We need to extract and produce some additional information.

As a starter in our statistical menu a boxplot never harms:

> boxplot(Flood1)<ENTER>

For calculations we better use a "pure" vector and not a dataframe:

> attach(Flood1)<ENTER>
> flood
<ENTER>
[1] 230 282 309 249 348 360 220 295 255 195 288 275 294 245 305 375 287 210 286
[20] 500 295 462 400 299 285 330 237 278 307 300

 

The average and standard deviation:

> Fmean<-mean(flood)<ENTER>
> Fstdev<-sd(flood)

Remark: if we toke the average and standard deviation from a dataframe R also remembers (= stores) the name of the vector "flood" within the data.frame "Flood1". We can work with that but it becomes slightly more complicated.


From here on we do not mention  <ENTER>  anymore. Red is commandline input followed by <enter> and blue is the response.


> Fstdev
[1] 67.75386
> Fmean
[1] 300.0333

For our analysis we need to sort the data:

> flood2<-rev(sort(flood));flood2
[1] 500 462 400 375 360 348 330 309 307 305 300 299 295 295 294 288 287 286
[19] 285 282 278 275 255 249 245 237 230 220 210 195

We used a new name "flood2" but could have used the old one "flood". R sorts in ascending order.  Descending order is required. We produce this by reversing the vector (function "rev").

Now we produce the ranks and store it in  vector "rflood". The total number of years is stored in "N.series".

> rflood<-rev(rank(flood2)); rflood
> N.series<-length(flood2);N.series

We no can produce the plotting position. This is a vector "T.estim" calculated from another vector "rflood" and with a scalar "N.series".

> T.estim<-(N.series+0.12)/(rflood-0.44); T.estim
[1] 53.785714 19.307692 11.765625 8.460674 6.605263 5.417266 4.591463
[8] 3.984127 and more.
..

We are ready for the frequency factors:

> KT<-(-sqrt(6)/pi*(0.5772 +log(log(T.estim/(T.estim-1) ) ) ) ); KT
[1] 2.64975196 1.83761304 1.43768206 1.16642365 0.95880018 0.78912391 0.64456920
[8] 0.51778933 0.40416984  and more.
..

The theoretical values based on their average and standard deviation are:

> flood.estim<-Fmean+KT*Fstdev;flood.estim
The estimated "flood.estim" should compare well with the observed and sorted ones "flood". We can judge them best in a plot which is linear for the given distribution.

> plot(flood2~KT)
> lines(flood.estim~KT)
> points(flood.estim~KT,pch="*")
 

The observed flows are shown with "O" ; the estimated ones by a line. Although not recommended the estimated are also plotted with "*". Shows a line with "*".

In this simple example the strength of R is not yet obvious.  When comparing models and with more advanced analysis the object-oriented approach  of R and S-plus quickly shows its value.

Further analysis is easy. A standard procedure is the Kolmogorov-Smirnov test. Two vectors of data-values are compared to see if they come from the same distribution.

> ks.test(flood2,flood.estim)

Similar the chi-square test to compare vectors.

> chisq.test(flood2,flood.estim)

Standard errors of estimates according to Shahin et al, 1993:

> deltaT<-sqrt(1+1.14*KT+1.1*KT^2)
> sdT<-deltaT*Fstdev/sqrt(N.series);sdT

> tvalue<-qt(0.975,df=N.series);tvalue
 

Upper and under confidence limits:

> flood.up<-flood.estim+tvalue*sdT ;
 flood.low<-flood.estim-tvalue*sdT;
 lines(flood.low~KT,col="blue",lwd=2);
 lines(flood.up~KT,col="red",lwd=3)

The lines command adds the lines on the existing plot:


More advanced R (not part of PROMAS course):

More efficient would be to define a function ( a library for frequency analysis in hydrology could be made.....)

> KTfun<-function(T.extrapol){
 (-sqrt(6)/pi*(0.5772 +log(log(T.extrapol/(T.extrapol-1) ) ) ) )
 }

The function can be used to calculate the frequency factor:

> KT<-KTfun(T.estim)

and used again to extrapolate to an average return period of 50, 75 and 100 years:


> KTextra<-KTfun(c(50,75,100)); KTextra
[1] 2.592288 2.911064 3.136681
 

> flood.extra<-Fmean+KTextra*Fstdev; flood.extra
[1] 475.6708 497.2691 512.5555


Now we wish to plot everything in one graph:

> sdtfun<-function(KT.extra){
deltaT<-sqrt(1+1.14*KT.extra+1.1*KT.extra^2)
 sdT<-deltaT*Fstdev/sqrt(N.series)
 }
> sdt2<-sdtfun(KText)

Again we make a function; slightly more work but rewarding for any later analysis.

All the calculated values and and upper and lower estimate:

> flood.calc<-Fmean+KText*Fstdev;
   flood.low<-flood.calc-tvalue*sdt2;
   flood.up<-flood.calc+tvalue*sdt2

 

We first plot the largest values ( flood.up) and then add the rest

> plot(flood.up~KText,lwd=3,col="red");
lines(flood.calc~KText);
points(flood.low~KText,lwd=2,col="blue");
points(floodall~KText,pch="*");
points(flood2~KT,pch="O",col="green")



One of the strenghts of R is that such an analysis can be automated by programming a few functions and procedures.

Up

19 April 2006 by Guido Wyseure