根据通用ID将行合并为一行

时间:2019-08-14 10:59:11

标签: r dplyr tidyr

我有一个包含3列的数据框:ID,F和M。我想根据ID将F和M的值连接到一行,而现在大多数它们分别位于带有NA的两行中。

不幸的是,我确实有一些重复的行,并且当前数据仍然有些混乱(请参见下面的示例)

我尝试了此操作,但出现错误:期望一个值:[extent = 2]。

 test2 <- test %>%  mutate(grouped_id = row_number()) %>%
                    group_by(BroodID) %>% 
                    summarise_each(funs(na.omit))   

以下是我的数据的可复制示例:

 structure(list(ID = c(2010.3, 2010.3, 2010.3, 2010.3, 2010.33, 
 2010.34, 2010.38, 2010.38, 2010.39, 2010.39, 2010.4, 2010.4, 
 2010.4, 2010.4, 2010.4, 2010.41, 2010.41, 2010.42, 2010.42, 2010.44, 
 2010.44, 2010.46, 2010.46), F = structure(c(5L, 5L, 12L, 12L, 
 11L, 8L, NA, 3L, NA, 1L, NA, 2L, 2L, 6L, 6L, NA, 7L, NA, 9L, 
 NA, 4L, NA, 10L), .Label = c("T206434", "T206553", "T931169", 
 "T931286", "T961275", "V470937", "X250041", "X250109", "X250195", 
 "X250568", "X251067", "X251069"), class = "factor"), M = structure(c(2L, 
 2L, 11L, 11L, 6L, NA, 9L, NA, 10L, NA, 1L, 1L, 4L, 4L, NA, 8L, 
 NA, 3L, NA, 7L, NA, 5L, NA), .Label = c("T206824", "T206994", 
"T960191", "T961486", "X250567", "X250779", "X250851", "X251046", 
 "X251066", "X251074", "X251116"), class = "factor")), row.names = c(NA, 
 23L), class = "data.frame")        

我希望将F和M的值分为两行,然后根据ID合并为一行。

1 个答案:

答案 0 :(得分:0)

我们可以将unitena.rm = TRUE一起使用以删除NA的值,并使用distinct仅包含unique行。

library(dplyr)

test %>%
  mutate_at(2:3, as.character) %>%
  tidyr::unite(combined, F, M, na.rm = TRUE, sep = ",") %>%
  distinct()

#        ID        combined
#1  2010.30 T961275,T206994
#2  2010.30 X251069,X251116
#3  2010.33 X251067,X250779
#4  2010.34         X250109
#5  2010.38         X251066
#6  2010.38         T931169
#7  2010.39         X251074
#8  2010.39         T206434
#9  2010.40         T206824
#10 2010.40 T206553,T206824
#11 2010.40 T206553,T961486
#12 2010.40 V470937,T961486
#13 2010.40         V470937
#14 2010.41         X251046
#15 2010.41         X250041
#16 2010.42         T960191
#17 2010.42         X250195
#18 2010.44         X250851
#19 2010.44         T931286
#20 2010.46         X250567
#21 2010.46         X250568

如果我们想通过ID进一步总结,我们可以做

test %>%
  mutate_at(2:3, as.character) %>%
  tidyr::unite(combined, F, M, na.rm = TRUE, sep = ",") %>%
  distinct() %>%
  group_by(ID) %>%
  summarise(combined = toString(combined))