我有一个数据框:
levels counts
1, 2, 2 24
1, 2 20
1, 3, 3, 3 15
1, 3 10
1, 2, 3 25
我想对待,例如," 1,2,2"和" 1,2"同样的事情。所以,只要有一个" 1"和" 2"没有任何其他角色,它将被视为等级" 1,2"。这是所需的数据框:
levels counts
1, 2 44
1, 3 25
1, 2, 3 25
以下是重现原始数据框的代码:
df <- data.frame(levels = c("1, 2, 2", "1, 2", "1, 3, 3, 3", "1, 3", "1, 2, 3"),
counts = c(24, 20, 15, 10, 25))
df$levels <- as.character(df$levels)
答案 0 :(得分:6)
拆分df$levels
,获取唯一元素,然后对其进行排序。然后使用它来获得counts
的汇总。
df$levels2 = sapply(strsplit(df$levels, ", "), function(x)
paste(sort(unique(x)), collapse = ", ")) #Or toString(sort(unique(x))))
aggregate(counts~levels2, df, sum)
# levels2 counts
#1 1, 2 44
#2 1, 2, 3 25
#3 1, 3 25
答案 1 :(得分:0)
解决方案使用tidyverse
。 df2
是最终输出。
library(tidyverse)
df2 <- df %>%
mutate(ID = 1:n()) %>%
mutate(levels = strsplit(levels, split = ", ")) %>%
unnest() %>%
distinct() %>%
arrange(ID, levels) %>%
group_by(ID, counts) %>%
summarise(levels = paste(levels, collapse = ", ")) %>%
ungroup() %>%
group_by(levels) %>%
summarise(counts = sum(counts))
根据以下评论,使用类似于d.b
的想法的解决方案df2 <- df %>%
mutate(l2 = map_chr(strsplit(levels, ", "),
.f = ~ .x %>% unique %>% sort %>% toString)) %>%
group_by(l2) %>%
summarise(counts = sum(counts))