我有以下df并希望传播/施放。
df <- data.frame(experiment=c("ex3", "ex1", "ex1", "ex2","ex7", "ex7"),
mod=c("mod1", "mod1","mod7", "mod8","mod3", "mod9"))
df
experiment mod
1 ex3 mod1
2 ex1 mod1
3 ex1 mod7
4 ex2 mod8
5 ex7 mod3
6 ex7 mod9
期望的输出
experiment mod_A mod_B
1 ex1 mod1 mod7
2 ex2 mod8 <NA>
3 ex3 mod1 <NA>
4 ex7 mod3 mod9
我尝试过tidyr :: spread但是出错了
df %>% spread(experiment, mod)
Error: Duplicate identifiers for rows (2, 3), (5, 6)
任何帮助都将不胜感激。
答案 0 :(得分:2)
我们可以为每个experiment
组创建一个ID列,以解决此问题。
library(dplyr)
library(tidyr)
df2 <- df %>%
arrange(experiment, mod) %>%
group_by(experiment) %>%
mutate(ID = 1:n()) %>%
spread(ID, mod) %>%
ungroup()
df2
# # A tibble: 4 x 3
# experiment `1` `2`
# <fct> <fct> <fct>
# 1 ex1 mod1 mod7
# 2 ex2 mod8 NA
# 3 ex3 mod1 NA
# 4 ex7 mod3 mod9
答案 1 :(得分:0)
使用dplyr
(版本0.7.5)和tidyr
(版本0.8.1),您只需summarise
和separate
。
df %>%
group_by(experiment) %>%
summarise(mod = paste(mod, collapse = ",")) %>%
separate(mod, into = c("mod_A", "mod_B"))
# A tibble: 4 x 3
# experiment mod_A mod_B
# <chr> <chr> <chr>
# 1 ex1 mod1 mod7
# 2 ex2 mod8 NA
# 3 ex3 mod1 NA
# 4 ex7 mod3 mod9