我有以下数据:
> View(mydata)
Gender Race Agegroup Date ..... #m columns
#1 Male Asian 1 2015/04/20 .....
#2 Female White 2 2015/04/15 .....
.
.
#n rows
我想将mydata转换为这种格式:
Gender=Male Gender=Female Race=Asian Race=White Agegroup = 1 Agegroup = 2 ......
1 0 0 0 1 0
0 1 0 1 0 1
. . . . . .
. . . . . .
我是R的新手,我知道循环会起作用,但有更清洁的方法吗?
答案 0 :(得分:3)
您可以使用model.matrix
在一次通话中展开多个变量:
(d <- data.frame(Gender=c("Male", "Male", "Female", "Male"), Race=c("White", "Asian", "White", "Black"), AgeGroup=factor(c(1, 2, 2, 1))))
# Gender Race AgeGroup
# 1 Male White 1
# 2 Male Asian 2
# 3 Female White 2
# 4 Male Black 1
model.matrix(~.+0, data=d, contrasts.arg=sapply(d, contrasts, contrasts=F))
# GenderFemale GenderMale RaceAsian RaceBlack RaceWhite AgeGroup1 AgeGroup2
# 1 0 1 0 0 1 1 0
# 2 0 1 1 0 0 0 1
# 3 1 0 0 0 1 0 1
# 4 0 1 0 1 0 1 0
# ...
contrasts.args
来电的model.matrix
位是来自here的代码,以确保所有因素的所有级别都显示在您的输出中。
答案 1 :(得分:2)
你可以使用package reshape2:
DF <- data.frame(gender = c("m", "f", "m"),
agegroup = factor(c(1, 2, 2)))
library(reshape2)
dum <- lapply(names(DF), function(x, df) {
d <- df[, x, drop = FALSE]
d$id = seq_along(d[, 1])
res <- dcast(d , id ~ ..., fun.aggregate = length)
names(res)[-1] <- paste(names(d)[1], names(res)[-1], sep ="=")
res
}, df = DF)
Reduce(merge, dum)
# id gender=f gender=m agegroup=1 agegroup=2
#1 1 0 1 1 0
#2 2 1 0 0 1
#3 3 0 1 0 1