Question

感谢您的帮助。

我的问题与此thread非常相关。

请注意这个df：

df <- data.frame(id = c(1,1,2,3,4), fruit =  c("apple","pear","apple","orange","apple"))

我们可以像这样传播'虚拟变量'：

df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)

现在请注意添加重复的fruit时会发生什么。

df2 <- data.frame(id = c(1,1,2,3,4,4), fruit =  c("apple","pear","apple","orange","apple","apple"))

再次spread

df2％＆gt;％mutate（i = 1）％＆gt;％spread（fruit，i，fill = 0）

提供Error: Duplicate identifiers for rows (5, 6)

理想情况下，正确的结果会返回两个名为apple_1和apple2的字段，这两个字段都应设置为1 id=4。

Answer 1

您正在寻找类似的东西：

library(reshape2)    
df2 <- data.frame(id = c(1,1,2,3,4,4), fruit =  c("apple","pear","apple","orange","apple","apple"), stringsAsFactors = FALSE)
    > dcast(df2, id ~ fruit, value.var = 'fruit', fun.aggregate = list )
      id        apple orange pear
    1  1        apple        pear
    2  2        apple            
    3  3              orange     
    4  4 apple, apple

另一种选择可能是：

> df2 %>%
  group_by(id) %>%
  mutate(fruit = paste(fruit, row_number(), sep = "_")) %>%
  dcast( id ~ fruit, value.var = "fruit", fun.aggregate = list )

  id apple_1 apple_2 orange_1 pear_2
1  1 apple_1                  pear_2
2  2 apple_1                        
3  3                 orange_1       
4  4 apple_1 apple_2

如果每列优选0/1，则：

> df2 %>%
  group_by(id) %>%
  mutate(fruit = paste(fruit, row_number(), sep = "_")) %>%
  dcast( id ~ fruit, fill = 0 , fun.aggregate = function(x) 1 )
  id apple_1 apple_2 orange_1 pear_2
1  1       1       0        0      1
2  2       1       0        0      0
3  3       0       0        1      0
4  4       1       1        0      0

使用ID和结果变量传播重复的行

1 个答案: