我有一个包含3列的数据框:ID,F和M。我想根据ID将F和M的值连接到一行,而现在大多数它们分别位于带有NA的两行中。
不幸的是,我确实有一些重复的行,并且当前数据仍然有些混乱(请参见下面的示例)
我尝试了此操作,但出现错误:期望一个值:[extent = 2]。
test2 <- test %>% mutate(grouped_id = row_number()) %>%
group_by(BroodID) %>%
summarise_each(funs(na.omit))
以下是我的数据的可复制示例:
structure(list(ID = c(2010.3, 2010.3, 2010.3, 2010.3, 2010.33,
2010.34, 2010.38, 2010.38, 2010.39, 2010.39, 2010.4, 2010.4,
2010.4, 2010.4, 2010.4, 2010.41, 2010.41, 2010.42, 2010.42, 2010.44,
2010.44, 2010.46, 2010.46), F = structure(c(5L, 5L, 12L, 12L,
11L, 8L, NA, 3L, NA, 1L, NA, 2L, 2L, 6L, 6L, NA, 7L, NA, 9L,
NA, 4L, NA, 10L), .Label = c("T206434", "T206553", "T931169",
"T931286", "T961275", "V470937", "X250041", "X250109", "X250195",
"X250568", "X251067", "X251069"), class = "factor"), M = structure(c(2L,
2L, 11L, 11L, 6L, NA, 9L, NA, 10L, NA, 1L, 1L, 4L, 4L, NA, 8L,
NA, 3L, NA, 7L, NA, 5L, NA), .Label = c("T206824", "T206994",
"T960191", "T961486", "X250567", "X250779", "X250851", "X251046",
"X251066", "X251074", "X251116"), class = "factor")), row.names = c(NA,
23L), class = "data.frame")
我希望将F和M的值分为两行,然后根据ID合并为一行。
答案 0 :(得分:0)
我们可以将unite
与na.rm = TRUE
一起使用以删除NA
的值,并使用distinct
仅包含unique
行。
library(dplyr)
test %>%
mutate_at(2:3, as.character) %>%
tidyr::unite(combined, F, M, na.rm = TRUE, sep = ",") %>%
distinct()
# ID combined
#1 2010.30 T961275,T206994
#2 2010.30 X251069,X251116
#3 2010.33 X251067,X250779
#4 2010.34 X250109
#5 2010.38 X251066
#6 2010.38 T931169
#7 2010.39 X251074
#8 2010.39 T206434
#9 2010.40 T206824
#10 2010.40 T206553,T206824
#11 2010.40 T206553,T961486
#12 2010.40 V470937,T961486
#13 2010.40 V470937
#14 2010.41 X251046
#15 2010.41 X250041
#16 2010.42 T960191
#17 2010.42 X250195
#18 2010.44 X250851
#19 2010.44 T931286
#20 2010.46 X250567
#21 2010.46 X250568
如果我们想通过ID
进一步总结,我们可以做
test %>%
mutate_at(2:3, as.character) %>%
tidyr::unite(combined, F, M, na.rm = TRUE, sep = ",") %>%
distinct() %>%
group_by(ID) %>%
summarise(combined = toString(combined))