Question

我有一个像这样的输入数据框：

我希望输出如下：

例如，我想占用第一个值（mary有生命），将其扫描到具有重复COL1条目的所有其他行，如果存在重复的COL2值，我需要在合并非重复项时单独消除重复项。换句话说，我想做模式搜索。如果另一行中存在相同的模式，我只想消除重复的模式并合并非重复的模式。

我尝试使用grepl和gsub函数，但我无法正确获得所需的结果。

在下面插入更简单的输入数据集版本：

COL1 COL2 10玛丽有生命 10唐玛丽有生命 10布里托玛丽有生命 20推他们 20推它们毛皮 30对此大喊大叫 30这是对此大喊大叫 40年 40小狗 40马

Answer 1

更新后：

df <- read.table(
  text = "COL1;    COL2
10;  mary has life
10;  Don mary has life
10;  Britto mary has life
20;  push them
20;  push them fur
30;  yell at this
30;  this is yell at this", 
  sep = ";", header = TRUE, 
  strip.white = TRUE, stringsAsFactors = FALSE)
library(dplyr)
res <- df %>%
  group_by(COL1) %>%
  do(COL2 = {
    first_value <- .$COL2[[1]]
    paste(unlist(Reduce(function(a, b) {
      new_values <- strsplit(b, first_value)[[1]]
      c(a, new_values)
    }, .$COL2)), collapse = ", ")
  })
res$COL2 <- unlist(res$COL2)

R - 消除重复值

1 个答案: