Question

假设我在matrix或data.frame中有一列，如下所示：

df <- data.frame(col1=sample(letters[1:3], 10, TRUE))

我希望将其展开为多列，每列为一列，0/1条目表示每行的等级是否存在

newdf <- data.frame(a=rep(0, 10), b=rep(0,10), c=rep(0,10))
for (i in 1:length(levels(df$col1))) {
  curLetter <- levels(df$col1)[i]
  newdf[which(df$col1 == curLetter), curLetter] <- 1
}
newdf

我知道有一个简单明智的解决方案，但我无法弄清楚它是什么。我在expand.grid上尝试了df，它按原样返回。同样，melt reshape2包中的df按原样返回df。我也试过reshape，但它抱怨不正确的尺寸或未定义的列。

Answer 1

显然，model.matrix是这里最直接的候选人，但在这里，我会提出三种选择：table，lapply和dcast（最后一个）此问题已标记为reshape2。

`table`

table(sequence(nrow(df)), df$col1)
#     
#      a b c
#   1  1 0 0
#   2  0 1 0
#   3  0 1 0
#   4  0 0 1
#   5  1 0 0
#   6  0 0 1
#   7  0 0 1
#   8  0 1 0
#   9  0 1 0
#   10 1 0 0

`lapply`

newdf <- data.frame(a=rep(0, 10), b=rep(0,10), c=rep(0,10))
newdf[] <- lapply(names(newdf), function(x) 
    { newdf[[x]][df[,1] == x] <- 1; newdf[[x]] })
newdf
#    a b c
# 1  1 0 0
# 2  0 1 0
# 3  0 1 0
# 4  0 0 1
# 5  1 0 0
# 6  0 0 1
# 7  0 0 1
# 8  0 1 0
# 9  0 1 0
# 10 1 0 0

`dcast`

library(reshape2)
dcast(df, sequence(nrow(df)) ~ df$col1, fun.aggregate=length, value.var = "col1")
#    sequence(nrow(df)) a b c
# 1                   1 1 0 0
# 2                   2 0 1 0
# 3                   3 0 1 0
# 4                   4 0 0 1
# 5                   5 1 0 0
# 6                   6 0 0 1
# 7                   7 0 0 1
# 8                   8 0 1 0
# 9                   9 0 1 0
# 10                 10 1 0 0

Answer 2

使用model.matrix

非常容易

model.matrix(~ df$col1 + 0)

术语+ 0表示不包括拦截。因此，您会收到每个因子级别的虚拟变量。

结果：

   df$col1a df$col1b df$col1c
1         0        0        1
2         0        1        0
3         0        0        1
4         1        0        0
5         0        1        0
6         1        0        0
7         1        0        0
8         0        1        0
9         1        0        0
10        0        1        0
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`df$col1`
[1] "contr.treatment"

将单个列扩展为宽/模型矩阵格式

2 个答案:

`table`

`lapply`

`dcast`