Question

我有一个如下的数据框

id <- c(1,1,1,2,2,2,1,3,4,4)
product <- c("a","b","c","a","d","f","e","f","e","f") 
df <- data.frame(id,product)

   id product
1   1       a
2   1       b
3   1       c
4   2       a
5   2       d
6   2       f
7   1       e
8   3       f
9   4       e
10  4       f

我想将其转换为数据框，如下所示。

id a b c d e f
1  1 1 1 0 1 0
2  1 0 0 1 0 1
3  0 0 0 0 0 1
4  0 0 0 0 1 1

基本上，每个id只需要一条记录，记录应该包含0或1，具体取决于是否购买了产品。我使用的是model.matrix，但它不按ID分组，我在原始数据集中得到10行。

Answer 1

as.data.frame.table（这是你as.data.frame表时所称的）非常合理地将表转换为long-form。为了防止这种情况，您需要将其视为矩阵：

 as.data.frame.matrix(table(df))
  a b c d e f
1 1 1 1 0 1 0
2 1 0 0 1 0 1
3 0 0 0 0 0 1
4 0 0 0 0 1 1

Answer 2

reshape命令非常灵活，类似于PROC TRANSPOSE及其所有特性。它会将id作为输出中的变量，缺失值是输出数据集中的未编码级别。这很容易处理并且反映实际数据（例如，缺少指示负（0）条件的数据不是输出中缺失的数据。）

df$ind <- 1

reshape(df, direction='wide', timevar='product', idvar='id')

给出

> reshape(df, direction='wide', timevar='product', idvar='id')
  id ind.a ind.b ind.c ind.d ind.f ind.e
1  1     1     1     1    NA    NA     1
4  2     1    NA    NA     1     1    NA
8  3    NA    NA    NA    NA     1    NA
9  4    NA    NA    NA    NA     1     1

并且很容易做其他事情。

aggregate提供类似的功能：

＆＃39;聚合（df $ product，df [，＆＃39; id＆＃39;，drop = F]，表格）＆＃39;

给出

> aggregate(df$product, df[, 'id', drop=F], table)
  id x.a x.b x.c x.d x.e x.f
1  1   1   1   1   0   1   0
2  2   1   0   0   1   0   1
3  3   0   0   0   0   0   1
4  4   0   0   0   0   1   1

并且很容易做其他事情。

Answer 3

查看table功能的帮助。

table(id,product)

要将其转换为数据框，请使用

as.data.frame.matrix(table(id,product))

我在Rronan的博客文章中找到了这个提示。

Answer 4

一个选项取决于 reshape2 ，还有许多其他可能/可能不需要按摩的选项：

> reshape2::dcast(data = df,formula = id~product,fun.aggregate = length,fill = 0L)
Using product as value column: use value.var to override.
  id a b c d e f
1  1 1 1 1 0 1 0
2  2 1 0 0 1 0 1
3  3 0 0 0 0 0 1
4  4 0 0 0 0 1 1

将factor变量转换为N个二进制变量

4 个答案: