Up

Data-structures in R

Homogeneous structures: vector, factor, matrix and array. Only one mode in the structure.

Object

Max dimension

One mode of

vector 1 numeric, character, complex or logical
factor 1 numeric or character
matrix 2 numeric, character, complex or logical
array n numeric, character, complex or logical

Other structures can have a mixture of modes:

Object Main use Several modes allowed in one object
data.frame dataset for analysis numeric, character, complex or logical
ts time series data numeric, character, complex or logical
list results; information from analysis numeric, character, complex, logical, function, expresion and  formula

Homogeneous structures: vector, factor, matrix and array

Concatenate function c() is useful to enter a few data into a structure. Here we create a vector "Stretch".

> Stretch<-c(46,54,48,50,44,42,52)
> Stretch
[1] 46 54 48 50 44 42 52

The individual elements  in the vector can be extracted by indices between square brackets "[ ]". Extracting the third element:

> Stretch[3]
[1] 48

Or even by "deselecting". We select all elements with the exception of  first up to third element. The ":" creates a range from 1 to 3.

> Stretch[-1:-3]
[1] 50 44 42 52

If we deselect element 1 and 3 but not 2 we need to give a vector. The ",' means another dimension.

> Stretch[-c(1,3)]
[1] 54 50 44 42 52
So as a vector has only one dimension; the minus 3 tries to exclude the nonexisting second dimension:

> Stretch[-1,-3)]
Error: syntax error
 

We enter another vector:

> Distance<-c(148,182,173,166,109,141,166)
> Distance

[1] 148 182 173 166 109 141 166

Remark that vector do not carry names and cannot be given names.

> names(Distance)
NULL
> names(Distance)<-c("cm")
Error in "names<-.default"(*tmp*, value = c("cm")) :
names attribute must be the same length as the vector

Now we construct a matrix: M.Elastic.Band by placing the vectors in columns by the function cbind().
 

> M.Elastic.Band<-cbind(Stretch,Distance)
> M.Elastic.Band

           Stretch Distance
[1,]            46       148
[2,]            54       182
[3,]            48       173
[4,]            50       166
[5,]            44       109
[6,]            42       141
[7,]            52       166

We check the type of object:
> is.matrix(M.Elastic.Band)
[1] TRUE

> is.data.frame(M.Elastic.Band)
[1] FALSE

An alternative is to give elements to the function matrix and indicate the dimensions. Not recommended here.

> M.El.Ba<-matrix(c(Stretch,Distance),nrow=7,ncol=2)
> M.El.Ba

         [,1]     [,2]
[1,]     46    148
[2,]     54    182
[3,]     48    173
[4,]     50    166
[5,]     44    109
[6,]     42    141
[7,]     52    166

We need however dataframes for statistical analysis:

> lm(M.Elastic.Band$Stretch~M.Elastic.Band$Distance, data=M.Elastic.Band)
Error in model.frame.default(formula = M.Elastic.Band$Stretch ~ M.Elastic.Band$Distance, :
`data' must be a data.frame, not a matrix or array

We can do operations on the vector, matrix and arrays.

We can operate on the individual elements:

> Dist2<-Distance*2
> Dist2

[1] 296 364 346 332 218 282 332

Or we can perform matrix operations: e.g. matrixmultiplication of the transposed ( function t() ) vector Stretch with the vector Distance. Matrix operations use the operator sign between " %+ %" for matrix addition.

> SDmat<-t(Stretch)%*%Distance
> SDmat

[,1]
[1,] 52590


Application:

We would like to determine the sample size for a t-test. We are interested in a difference in the average which is equal to the standard deviation.

> Nsample<-c(5:30)
> Nsample

[1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[26] 30

Use help to know the default values of the function power.t.test

> help(power.t.test)
Argument delta is given.
> power.t.test(n=Nsample,delta=1)

Two-sample t test power calculation

n = 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
delta = 1
sd = 1
sig.level = 0.05
power = 0.2859276, 0.3471565, 0.4056971, 0.4611759, 0.5133256, 0.5619846, 0.6070844, 0.6486344, 0.6867055, 0.7214166, 0.7529210, 0.7813965, 0.8070359, 0.8300400, 0.8506120, 0.8689528, 0.8852576, 0.8997136, 0.9124983, 0.9237780, 0.9337076, 0.9424303, 0.9500772, 0.9567684, 0.9626126, 0.9677083
alternative = two.sided

NOTE: n is number in *each* group


We see that the function worked on individual elements in the vector Nsample. We can see that n=17 will achieve a power of 80% which is often accepted a sufficient power.


Data.frames and lists

Most analysis has to be executed on data.frames. The result of an analysis is a list.

We can create a data.frame

> Elastic.Band<-data.frame(M.Elastic.Band)
> Elastic.Band

         Stretch Distance
1              46       148
2              54       182
3              48       173
4              50       166
5              44       109
6              42       141
7              52       166

R has remembered the names of the vectors and uses them as names in the data.frames.

> names(Elastic.Band)
[1] "Stretch" "Distance"

For some functions like plot a matrix or a dataframe can be used:

> plot(Distance~Stretch,data=M.Elastic.Band)
> plot(Distance~Stretch,data=Elastic.Band)

The same plot will appear.

However to perform a statistical analysis like "lm" (linear model) for regression it has to be a data.frame. The commands below are equivalent and give the same result.

> lm(Elastic.Band$Distance~Elastic.Band$Stretch)
> lm(Distance~Stretch,data=Elastic.Band)

> attach(Elastic.Band)
> lm(Distance~Stretch)

Normally we prefer to store the result in a list:

> RegEB<-lm(Distance~Stretch)
> RegEB


Call:
lm(formula = Distance ~ Stretch)

Coefficients:
(Intercept) Stretch
-63.571 4.554
 

We see the same output each time....However there is more in the list then what is displayed.

By following commands we can see the full list.

> Nelem<-length(RegEB)
> Nelem

[1] 12
> RegEB[1:Nelem]
 

Very long display of the 12 different elements in the list. Content will be studied later. Important here is to understand the concept of a list. One example out of the long list:

$fitted.values
1 2 3 4 5 6 7
145.8929 182.3214 155.0000 164.1071 136.7857 127.6786 173.2143

We can get by its name or its number (=5)...

> RegEB[5]
$fitted.values
            1              2              3              4              5              6               7
145.8929 182.3214 155.0000 164.1071 136.7857 127.6786 173.2143

> RegEB$fitted.values
            1              2              3              4              5              6               7
145.8929 182.3214 155.0000 164.1071 136.7857 127.6786 173.2143

Useful is to save the Workspace under a name at the end of a session (and during). At the beginning of next session the Workspace can be loaded. The objects already created are available.

Similarly the history can be saved.

 


Up

10 May 2005 by Guido Wyseure