将列因子转换为包含二进制的列

时间:2015-04-20 14:51:27

标签: r

我有以下数据:

> View(mydata)   

   Gender   Race  Agegroup  Date       ..... #m columns
#1 Male   Asian     1      2015/04/20 .....
#2 Female  White    2      2015/04/15 .....
.
.
#n rows

我想将mydata转换为这种格式:

Gender=Male  Gender=Female  Race=Asian  Race=White   Agegroup = 1   Agegroup = 2 ......
    1             0              0             0              1               0
    0             1              0             1              0               1
    .             .              .             .              .               .
    .             .              .             .              .               .

我是R的新手,我知道循环会起作用,但有更清洁的方法吗?

2 个答案:

答案 0 :(得分:3)

您可以使用model.matrix在一次通话中展开多个变量:

(d <- data.frame(Gender=c("Male", "Male", "Female", "Male"), Race=c("White", "Asian", "White", "Black"), AgeGroup=factor(c(1, 2, 2, 1))))
#   Gender  Race AgeGroup
# 1   Male White        1
# 2   Male Asian        2
# 3 Female White        2
# 4   Male Black        1

model.matrix(~.+0, data=d, contrasts.arg=sapply(d, contrasts, contrasts=F))
#   GenderFemale GenderMale RaceAsian RaceBlack RaceWhite AgeGroup1 AgeGroup2
# 1            0          1         0         0         1         1         0
# 2            0          1         1         0         0         0         1
# 3            1          0         0         0         1         0         1
# 4            0          1         0         1         0         1         0
# ...

contrasts.args来电的model.matrix位是来自here的代码,以确保所有因素的所有级别都显示在您的输出中。

答案 1 :(得分:2)

你可以使用package reshape2:

DF <- data.frame(gender = c("m", "f", "m"),
                 agegroup = factor(c(1, 2, 2)))


library(reshape2)
dum <- lapply(names(DF), function(x, df) {
  d <- df[, x, drop = FALSE]
  d$id = seq_along(d[, 1])
  res <- dcast(d , id ~ ..., fun.aggregate = length)
  names(res)[-1] <- paste(names(d)[1], names(res)[-1], sep ="=")
  res
}, df = DF)


Reduce(merge, dum)
#  id gender=f gender=m agegroup=1 agegroup=2
#1  1        0        1          1          0
#2  2        1        0          0          1
#3  3        0        1          0          1