Up

Simple data entry in R

  1. Do not forget R is case-sensitive: "mean", "Mean" and "mEan" are three different names!!!!

  2. R uses the "." in a name as just a character e.g. "Mean.of.X"

  3. Although a lot is possible, including interaction with other software like ArcView, databases etc...in this introduction we wish to limit ourselves to the bare mimimum.

  4. Spreadsheet are used for first screening of data and therefore data-entry from spreadsheet

For very small datasets

Several methods are available. The concatenate function c( ) works as follows.

> Stretch<-c(46,54,48,50,44,42,52)
> Stretch

[1] 46 54 48 50 44 42 52

This is not very useful but we do not have to store the result of the function c( ) into a vector:

> c(46,54,48,50,44,42,52)
[1] 46 54 48 50 44 42 52
 

See Data-structures in R for types of data and the manipulation.

Data-entry from txt-file (including spreadsheet created txt-files)

In what follows a data-frame (the most important data-structure for data analysis) is made starting from a txt file.

The function "read.table( )" can be used.

> help(read.table)

A window appears with help:

and more.....
 

The function "read.table( )" has a long list of arguments between the brackets. Most arguments have default values. If the default values are OK we do not change them and omit them in the list. Do not be impressed or intimidated, simple use is also possible.

Examples:

  1. if the txt file contains no headers at the top we do not specify ",header=False," if headers are present we have to specify ",header=TRUE,"

  2. if we have to 3 skip lines at the top we specify  ",skip=3,"

  3. if the missing values are indicated by the word  "missing" we specify ",na.strings="missing", "

  4. by default a line starting with "#" is a comment and is not read

Only compulsory field in the list is the file. Save a file ch05pr05.txt to your working directory. Right click on the linked file and save. On Ludit PC-classes this could be "D:\user\temp\...". The path to the file-name can be relative or absolute.

> Gegevens<-read.table("D:\\user\\temp\\ch05pr05.txt")

Remark that for reasons of "cross-platform compatability" the slashes are double. If the function call is succesfull and empty prompt on next line follows. Ohterwise some error message can appear.

The data.frame can be shown immediately by:

> Gegevens<-read.table("D:\\user\\temp\\ch05pr05.dat"), Gegevens

See data-structures to manipulate the data.frame "Gegevens".

NOTE 1 : newer versions of R allow direct access to data on web-servers:

> Gegevens<-read.table("http://www.biw.kuleuven.be/vakken/statisticsbyR/datasetsTXT/CH05PR05.txt"), Gegevens

this will directly create the data.frame Gegevens.

NOTE 2: From the menu one can use "File/Display file" to have a look at the data and check the path.

The file can be shown as:

NOTE 3: with "file.choose" you browse to a file without specifying the exact path, e.g.:

>Gegevens<-read.table(file.choose())

However, you cannot see what arguments ( as "header =TRUE" ) you have to use without inspecting the data firstly (see NOTE 2); so you need to know the file-format and content in advance.

Preparing textfiles from spreadsheets

Remark: new method with an Add-in into Excel: PopTools approach for data-transfer

Put the data in a simple format on an empty worksheet of the spreadsheet. The data should look in the way we whish to create the txt-file.

Some comments can be put on top provided the cells start with "#". These lines with explanation will not be read in the data.frame but can be useful at at later stage.

A first row can contain titles for the variables. If so put "header = TRUE" in the argument list. Make sure no spaces or blanks are used in the titles. You could use "." 's in names of variables in stead of blanks. Do not use "_" also.

Make sure the width of the cells is wide enough and format the numers to a sensible number of significant figures.

Centering of the cell content can help to avoid cell contents of neighbouring cells to collide.

Save this worksheet page as txt-file. Space delimited will normally be good and gives by default a "*.prn" file. Save the spreadsheet as an *.xls before saving the worksheet as a txt-file. While saving as a txt-file Excel complains a few times.

Alternatively a CSV (comma separated value) is good, then you specify " ,sep=",",  " (separator is comma) or use a related function:

 > read.csv(file)

One small complication is the decimal character. You can specify " ,dec=",",  " for a decimal "," as an example. By default is the decimal "." valid as decimal character.

After saving it is a good practice to use "File/Display file" to have a look at the file (name and extension) and check the path. If the result is not satisfactory improve the layout/format of the worksheet and save again... Do not close the spreadsheet (after saving as a spreadsheet !!!)  before the result is good.

 Advice: a bit of perserverance and care for detail is required....

Export of data frames

The procedure as you might expect is explained by:

> help(write.table)

Minimum argument list is:

>write.table(x,file)

with x as the data.frame and file the path/filename.

Example given in the help for a CSV file:

 ## To write a CSV file for input to Excel one might use
write.table(x, file = "foo.csv", sep = ",", col.names = NA)
## and to read this file back into R one needs
read.table("file.csv", header = TRUE, sep = ",", row.names=1)

 

Keep it simple and neat.....manipulation of the data in spreadsheet is often the easiest approach.

 


Up

10 May 2005 by Guido Wyseure