Question

我有一个数据框：

levels     counts
1, 2, 2        24
1, 2           20
1, 3, 3, 3     15
1, 3           10
1, 2, 3        25

我想对待，例如，＆＃34; 1,2,2＆＃34;和＆＃34; 1,2＆＃34;同样的事情。所以，只要有一个＆＃34; 1＆＃34;和＆＃34; 2＆＃34;没有任何其他角色，它将被视为等级＆＃34; 1,2＆＃34;。这是所需的数据框：

levels     counts
  1, 2         44
  1, 3         25
  1, 2, 3      25

以下是重现原始数据框的代码：

df <- data.frame(levels = c("1, 2, 2", "1, 2", "1, 3, 3, 3", "1, 3", "1, 2, 3"), 
                 counts = c(24, 20, 15, 10, 25))
df$levels <- as.character(df$levels)

Answer 1

拆分df$levels，获取唯一元素，然后对其进行排序。然后使用它来获得counts的汇总。

df$levels2 = sapply(strsplit(df$levels, ", "), function(x)
    paste(sort(unique(x)), collapse = ", "))   #Or toString(sort(unique(x))))
aggregate(counts~levels2, df, sum)
#  levels2 counts
#1    1, 2     44
#2 1, 2, 3     25
#3    1, 3     25

Answer 2

解决方案使用tidyverse。 df2是最终输出。

library(tidyverse)

df2 <- df %>%
  mutate(ID = 1:n()) %>%
  mutate(levels = strsplit(levels, split = ", ")) %>%
  unnest() %>%
  distinct() %>%
  arrange(ID, levels) %>%
  group_by(ID, counts) %>%
  summarise(levels = paste(levels, collapse = ", ")) %>%
  ungroup() %>%
  group_by(levels) %>%
  summarise(counts = sum(counts))

更新

根据以下评论，使用类似于d.b

的想法的解决方案

df2 <- df %>% 
  mutate(l2 = map_chr(strsplit(levels, ", "), 
                      .f = ~ .x %>% unique %>% sort %>% toString)) %>%
  group_by(l2) %>% 
  summarise(counts = sum(counts))

如何计算字符串中重复字符的实例？

2 个答案:

更新