感谢您的帮助。
我的问题与此thread非常相关。
请注意这个df:
df <- data.frame(id = c(1,1,2,3,4), fruit = c("apple","pear","apple","orange","apple"))
我们可以像这样传播'虚拟变量':
df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)
现在请注意添加重复的fruit
时会发生什么。
df2 <- data.frame(id = c(1,1,2,3,4,4), fruit = c("apple","pear","apple","orange","apple","apple"))
再次spread
df2%&gt;%mutate(i = 1)%&gt;%spread(fruit,i,fill = 0)
提供Error: Duplicate identifiers for rows (5, 6)
理想情况下,正确的结果会返回两个名为apple_1
和apple2
的字段,这两个字段都应设置为1
id=4
。
答案 0 :(得分:0)
您正在寻找类似的东西:
library(reshape2)
df2 <- data.frame(id = c(1,1,2,3,4,4), fruit = c("apple","pear","apple","orange","apple","apple"), stringsAsFactors = FALSE)
> dcast(df2, id ~ fruit, value.var = 'fruit', fun.aggregate = list )
id apple orange pear
1 1 apple pear
2 2 apple
3 3 orange
4 4 apple, apple
另一种选择可能是:
> df2 %>%
group_by(id) %>%
mutate(fruit = paste(fruit, row_number(), sep = "_")) %>%
dcast( id ~ fruit, value.var = "fruit", fun.aggregate = list )
id apple_1 apple_2 orange_1 pear_2
1 1 apple_1 pear_2
2 2 apple_1
3 3 orange_1
4 4 apple_1 apple_2
如果每列优选0/1,则:
> df2 %>%
group_by(id) %>%
mutate(fruit = paste(fruit, row_number(), sep = "_")) %>%
dcast( id ~ fruit, fill = 0 , fun.aggregate = function(x) 1 )
id apple_1 apple_2 orange_1 pear_2
1 1 1 0 0 1
2 2 1 0 0 0
3 3 0 0 1 0
4 4 1 1 0 0