连接两个字符串,删除重复项

时间:2019-05-17 13:01:57

标签: r dplyr

假设我有以下数据:

data = data.frame(
  name=c("bob", "bob", "mary", "mary", "mary"),
  colour=c("blue", "blue", "blue", "green", "green"),
  number=c(1,1,1,2,3))

data

  name colour number
1  bob   blue      1
2  bob   blue      1
3 mary   blue      1
4 mary  green      2
5 mary  green      3

如何将以上内容合并为两列,删除任何重复的字符串? 我尝试过:

data <- data %>% group_by(`name`) %>%
  summarise_all(funs(paste(na.omit(.), collapse = ", ")))

但是得到以下错误信息:

  name             colour  number
1  bob         blue, blue    1, 1
2 mary blue, green, green 1, 2, 3

预期输出:

 name      colour number
1  bob        blue      1
2 mary blue, green  1,2,3

1 个答案:

答案 0 :(得分:0)

data.table单行代码。

样本数据

library(data.table)
DT <-fread("
  name colour number
            bob   blue      1
            bob   blue      1
           mary   blue      1
           mary  green      2
           mary  green      3")

代码

cols <- c("colour", "number")
DT[, lapply(.SD, function(x) { paste0( unique(x), collapse = ",") }), 
   by = ,(name), .SDcols = cols][]

输出

#    name     colour number
# 1:  bob       blue      1
# 2: mary blue,green  1,2,3