I come from Object Oriented programming background and I find it difficult to wrap my head around R's programming approach. Here is the excerpt that I am stumbled upon:
> kids = factor(c(1,0,1,0,0,0), levels = c(0, 1),labels = c("boy","girl"))
> as.numeric(kids)
[1] 2 1 2 1 1 1
I was thinking it should print
[1] 1 0 1 0 0 0
since these {0,1} are the levels specified in factor()
. But thats not the case. Then what are 2 1 2 1 1 1
values? Is it something like numeric representation of factor's elements maintained internally by R. or better to ask:
What
as.numeric()
onfactor
(i.eas.numeric(factorXyz)
) returns?
If they are not the levels but some internal numeric values, then whats the point in having levels associated with factor elements?
答案 0 :(得分:1)
Consider the case of
kids = factor(c("g", "b", "g", "b", "b", "b"),
levels = c("b", "g"),
labels = c("boy", "girl"))
In this case, it makes more sense to create a natural number reference to the factor's levels. factor
is somewhat indifferent to what kind of input you provide it. It simply wants to consider the levels as natural numbers beginning with 1.
If my understanding is correct, this was originally designed around memory concerns around storing lots of characters in data. See stringsAsFactors: An unauthorized biography for the details behind the original design decisions.