我在数据框中有一个因子变量,如下所示
str(x)
'data.frame': 1000 obs. of 10 variables:
$ PK : chr "1-108" "1-10M" "1-10F" "1-10Q" ...
$ var1 : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ var2: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
当我在var1上对x进行子集时,因子变量的值在str命令中显示为2
y <- x[x$var1== 1,]
str(y)
'data.frame': 300 obs. of 10 variables:
$ PK : chr "1-12U" "1-13895" "1-13R" "1-149" ...
$ var1: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ var2: Factor w/ 2 levels "0","1": 2 1 1 2 2 1 2 2 2 1 ...
但var1的实际值是1
y$var1
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
答案 0 :(得分:1)
那么,因子实际上在内部表示为从1开始的自然数,并按顺序映射到因子水平。使用'str(y)'可以获得内部表达式,通过'print(y)'或简单地在交互式环境中使用'y',您将获得映射值。
让我告诉你:
> t = c("0", "0", "1", "1")
> t
[1] "0" "0" "1" "1"
> t2 = as.factor(t)
> t2
[1] 0 0 1 1
Levels: 0 1
> str(t2)
Factor w/ 2 levels "0","1": 1 1 2 2
> t2[t2 == 0]
[1] 0 0
Levels: 0 1
> str(t2[t2 == 0])
Factor w/ 2 levels "0","1": 1 1
答案 1 :(得分:0)
过滤数据时,系数var1
的级别不会更改。
x <- factor(1, levels = c(0, 1))
str(x)
# Factor w/ 2 levels "0","1": 2