我正在使用教育数据集:426名学生对8个多项选择题的答案(1
=正确,0
=不正确),以及指示哪位教师({{1}教他们的课程。
按照目前的情况,我的数据位于1, 2, or 3
,如下所示:
data.df
但那些 str(data.df)
'data.frame': 426 obs. of 9 variables:
$ ques01: int 1 1 1 1 1 1 0 0 0 1 ...
$ ques02: int 0 0 1 1 1 1 1 1 1 1 ...
$ ques03: int 0 0 1 1 0 0 1 1 0 1 ...
$ ques04: int 1 0 1 1 1 1 1 1 1 1 ...
$ ques05: int 0 0 0 0 1 0 0 0 0 0 ...
$ ques06: int 1 0 1 1 0 1 1 1 1 1 ...
$ ques07: int 0 0 1 1 0 1 1 0 0 1 ...
$ ques08: int 0 0 1 1 1 0 1 1 0 1 ...
$ inst : num 1 1 1 1 1 1 1 1 1 1 ...
值不是真正整数。相反,我认为将R视为实验因素更好。 “inst”值也是如此。
ques0x
和int
转换为num
理想情况下,优雅的解决方案应该生成一个数据框 - 我称之为factors
- 看起来像这样:
factorData.df
我很确定无论你们提出什么样的解决方案,都应该很容易推广到任何需要重新分类的变量,并且可以在大多数情况下使用常见转化(例如 str(factorData.df)
'data.frame': 426 obs. of 9 variables:
$ ques01: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 2 ...
$ ques02: Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 2 2 ...
$ ques03: Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 2 1 2 ...
$ ques04: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
$ ques05: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
$ ques06: Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 2 ...
$ ques07: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2 ...
$ ques08: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 2 2 1 2 ...
$ inst : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
和int -> factor
)。
因为我目前笨重的代码只有9个单独的num -> int
语句,每个变量一个,就像这个
factorData.df$ques01
我对R,编程和stackoverflow都是全新的。请保持温和,并提前感谢您的帮助!
答案 0 :(得分:11)
This was also answered in R-Help.
我想有更好的方法,但这里有两个选择:
# use a sample data set
> str(cars)
'data.frame': 50 obs. of 2 variables:
$ speed: num 4 4 7 7 8 9 10 10 10 11 ...
$ dist : num 2 10 4 22 16 10 18 26 34 17 ...
> data.df <- cars
您可以使用lapply
:
> data.df <- data.frame(lapply(data.df, factor))
或for
声明:
> for(i in 1:ncol(data.df)) data.df[,i] <- as.factor(data.df[,i])
在任何一种情况下,你最终得到你想要的东西:
> str(data.df)
'data.frame': 50 obs. of 2 variables:
$ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
$ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...
答案 1 :(得分:5)
我在plyr
包中找到了另一种解决方案:
# load the package and data
> library(plyr)
> data.df <- cars
使用colwise函数:
> data.df <- colwise(factor)(data.df)
> str(data.df)
'data.frame': 50 obs. of 2 variables:
$ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
$ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...
顺便说一句,如果您查看colwise函数,它只使用lapply
:
df <- as.data.frame(lapply(filtered, .fun, ...))