|
Do not forget R is case-sensitive: "mean", "Mean" and "mEan" are three different names!!!!
R uses the "." in a name as just a character e.g. "Mean.of.X"
Formulas can be used in functions or as part of function definition
The formula specifies a model for functions like "lm" etc. In a formula a response variable, like "y", is modelled.
Right to the ~ operator
the model is specified while left we have the response. An expression of the
form y ~ model is interpreted as a
specification that the response y
is modelled by a linear predictor specified symbolically by
model.
A model consists of a series of terms separated by +
operators.
To avoid confusion, the function
I() is used to bracket those portions of a
model formula where the operators are used in their arithmetic
sense. For example, in the formula y ~ a + I(b+c), the
term b+c is to be interpreted as one single variable composed by
the sum of b and c and
is a model with only two variables a and I(b+c). This
is not as model with three variables a, b and c. The model with three
variables would be y ~ a + b + c .
* In case this text is read for ANOVA. See
footnote for ANOVA; not so
relevant for regression
Additional features important for regression:
The - operator
removes terms and is used to remove the intercept term: y~x - 1 is
a line through the origin.
While formulae usually involve just variable and factor names, they can also
involve arithmetic expressions. The formula log(y) ~ a +
log(x) is quite legal. When such arithmetic expressions
involve operators which are also used symbolically in model formulae, there can
be confusion between arithmetic and symbolic operator use. This confusion is
avoided by the I( ) function like in the example of a forth degree polynomial.
This model can be specified as y ~ x + I(x^2)+ I(x^3)+ I(x^4).
The "Statistical Linear Model" in R has a specific function "lm". Regression and ANOVA models can be executed.
Regression models use quantitative variables in the model
ANOVA models use qualitative factors
ANCOVA models use both quantitative variables and qualitative factors (important is to define
The function factor is used to encode a vector as a factor (the
names category and enumerated type are also used for factors. The function
is.factor can be use to test whether se classes.
> help(lm)
A window appears with help; part is shown below:
| some text before.....
Usage: and more..... |
For most applications the argument list contains the formula and the data.frame.
The formula itself can be treated as an object. We can use the same data as the first regression problem: CH01TA01.txt Save it on some place like "C:\temp\" and make a data.frame.
> Toluca<-read.table("c://temp//Ch01Ta01.txt")
or access directly on the web-page:
> Toluca<-read.table("http://www.biw.kuleuven.be//vakken//statisticsbyR//datasetsTXT//CH01TA01.txt")
This data.frame can be used in R:
> names(Toluca)<-c("x","y")
Now we define a object which is the single linear regression formula and give it the name "SLR"
> SLR<-y~x
The object SLR is now the formula we can use where we need this formula:
> plot(SLR,data=Toluca)
> SLRresult<-lm(SLR,data=Toluca)
> SLRresult
Call:
lm(formula = SLR, data = Toluca)
Coefficients:
(Intercept)
x
62.37 3.57
The formula is stored between other elements of the list SLRresult:
> SLRresult$call$formula
SLR
> SLR
y ~ x
A very strong point of R is that everything is treated as an object an can be stored, recalled and used as such. Initially it does not make live easier for a novel user; but when it comes to comparing models (stored in models) this quickly offers an advantage.
==FOOTNOTE
ANOVA : not for
regression !!! More complex operators; useful for
advanced ANOVA; ANCOVA etc====
In addition to + , a number of other operators are useful in
model formulae:
*
operator denotes factor crossing: a*b interpreted as
a+b+a:b. ^
operator indicates crossing to the specified degree. For example
(a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in
turn expands to a formula containing the main effects for a,
b and c together with their second-order interactions. %in% operator indicates that the terms
on its left are nested within those on the right.
For example a+b%in%a expands to the formula a+a:b.
===========not for regression !!!==================================================================
| Up |
10 May 2005 by Guido Wyseure