使用dplyr合并包含相似字符串的行

时间:2020-02-20 14:55:03

标签: r dplyr

我有一个包含以下数据的表:

viewModel.navigation.observeConsuming(viewLifecycleOwner) { 
    startActivity(Intent(this, LoginActivity::class.java))
}

表格:

df <- tibble(
  dose = seq(10, 50, 10), 
  date = c("2007-12-15", "2007-10-13","2007-10-13","2007-09-30","2007-09-30"), 
  response = c(45, 67, 66, 54, 55), 
  name = c("Peter,Martin", "Gale,Rebecca", "Rebecca,Gale", "Jonathan,Smith", "Smith,Jonathan")
)

名为# A tibble: 5 x 4 dose date response name <dbl> <chr> <dbl> <chr> 1 10 2007-12-15 45 Peter,Martin 2 20 2007-10-13 67 Gale,Rebecca 3 30 2007-10-13 66 Rebecca,Gale 4 40 2007-09-30 54 Jonathan,Smith 5 50 2007-09-30 55 Smith,Jonathan 的列之一具有字符串“ FirstName,LastName”或“ LastName,FirstName”。我希望合并包含相同名称的行(如果以任何一种方式对其进行排序)。例如,包含Rebecca,Gale和Gale,Rebecca的行应合并。

合并时,我希望获取namedose列的总和,并希望保留responsedate条目中的第一个条目。

预期结果:

name

请注意,我始终希望使用# A tibble: 3 x 4 dose date response name <dbl> <chr> <dbl> <chr> 1 10 2007-12-15 45 Peter,Martin 2 50 2007-10-13 133 Gale,Rebecca 3 90 2007-09-30 109 Jonathan,Smith 列而不是name列进行合并,因为即使示例包含相同的日期,我的较大表也使用相同名称的不同日期。< / p>

1 个答案:

答案 0 :(得分:3)

这是一个主意。

library(tidyverse)

df2 <- df %>%
  mutate(date = as.Date(date)) %>%
  mutate(name = map_chr(name, ~toString(sort(str_split(.x, ",")[[1]])))) %>%
  group_by(name) %>%
  summarize(dose = sum(dose),
            response = sum(response),
            date = first(date)) %>%
  select(names(df)) %>%
  ungroup()
df2
# # A tibble: 3 x 4
#    dose date       response name           
#   <dbl> <date>        <dbl> <chr>          
# 1    50 2007-10-13      133 Gale, Rebecca  
# 2    90 2007-09-30      109 Jonathan, Smith
# 3    10 2007-12-15       45 Martin, Peter