如何将单个列传播或转换为多个列 - 错误:行的重复标识符

时间:2018-05-28 01:21:22

标签: r tidyr

我有以下df并希望传播/施放。

df <- data.frame(experiment=c("ex3", "ex1", "ex1", "ex2","ex7", "ex7"),
                 mod=c("mod1", "mod1","mod7", "mod8","mod3", "mod9"))

df

  experiment  mod
1        ex3 mod1
2        ex1 mod1
3        ex1 mod7
4        ex2 mod8
5        ex7 mod3
6        ex7 mod9

期望的输出

  experiment mod_A mod_B
1        ex1  mod1  mod7
2        ex2  mod8  <NA>
3        ex3  mod1  <NA>
4        ex7  mod3  mod9

我尝试过tidyr :: spread但是出错了

df %>%  spread(experiment, mod)

Error: Duplicate identifiers for rows (2, 3), (5, 6)

任何帮助都将不胜感激。

2 个答案:

答案 0 :(得分:2)

我们可以为每个experiment组创建一个ID列,以解决此问题。

library(dplyr)
library(tidyr)

df2 <- df %>%
  arrange(experiment, mod) %>%
  group_by(experiment) %>%
  mutate(ID = 1:n()) %>%
  spread(ID, mod) %>%
  ungroup()
df2
# # A tibble: 4 x 3
#   experiment `1`   `2`  
#   <fct>      <fct> <fct>
# 1 ex1        mod1  mod7 
# 2 ex2        mod8  NA   
# 3 ex3        mod1  NA   
# 4 ex7        mod3  mod9

答案 1 :(得分:0)

使用dplyr(版本0.7.5)和tidyr(版本0.8.1),您只需summariseseparate

df %>%
  group_by(experiment) %>%
  summarise(mod = paste(mod, collapse = ",")) %>%
  separate(mod, into = c("mod_A", "mod_B"))

# A tibble: 4 x 3
#   experiment mod_A mod_B
#   <chr>      <chr> <chr>
# 1 ex1        mod1  mod7 
# 2 ex2        mod8  NA   
# 3 ex3        mod1  NA   
# 4 ex7        mod3  mod9