我使用以下命令将data.frame
(包含80列和10.000行)的因子变量school
与两个级别(0: high, 1:low
)分层:
high.school=data.frame[which(data.frame$school==0) , ]
low.school=data.frame[which(data.frame$school==1) , ]
但它导致两个data.frames
包含所有80列,但0 rows
。为什么没有考虑行?
谢谢!
答案 0 :(得分:1)
简答 - 为何使用哪个? 让我们尝试一下。
> DF=NULL
> DF$school=as.factor(sample(c(0,1),10000,T))
> DF=as.data.frame(DF)
> head(DF)
school
1 0
2 0
3 1
4 1
5 0
6 1
> str(DF)
'data.frame': 10000 obs. of 1 variable:
$ school: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
所以你看到学校的因子值是1,2而因子水平是0,1 现在尝试以下
> df2=DF[DF$school==1,]
> df3=DF[DF$school==0,]
> str(df2)
Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
> str(df3)
Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
> head(df2)
[1] 1 1 1 1 1 1
Levels: 0 1
> head(df3)
[1] 0 0 0 0 0 0
Levels: 0 1