Question

编辑我有一个像这样的输入数据框：

我希望输出如下：

请在下面找到我的解释。我完全不知道给出更详细的解释：（

让我解释....在输入数据集中，对于COL1值为“10”的行，我想扫描COL2值并用“*”替换任何重复的文本模式...同样的逻辑去对于具有重复COL1值的所有COL2值。 我想为此使用gsub函数..

我尝试了gsub和粘贴几次并且没有得到所需的输出，因为我不知道如何匹配重复内部的所有模式。

我已经问过这个问题。但由于我没有得到答案，我正在重新发布。

在下面附加输入数据框的输入：

structure(list(COL1 = c(10L, 10L, 10L, 20L, 20L, 30L, 30L, 40L, 
40L, 40L, 50L, 50L, 50L), COL2 = c("mary has life", "Don mary has life", 
"Britto mary has life", "push them fur", "push them ", "yell at this", 
"this is yell at this", "Year", "Doggy", "Horse", "This is great job", 
"great job", "Donkey")), .Names = c("COL1", "COL2"), row.names = c(NA, 
-13L), class = "data.frame")

Answer 1

您可以编写一个函数来为组中的每个项目运行gsub并选择最短的替换（当然除了它自己）：

fun <- function(col){
    matches <- sapply(col, function(x){gsub(x, '\\*', col)}); 
    diag(matches) <- NA; 
    apply(matches, 1, function(x){x[which.min(nchar(x))]})
}

现在用你最喜欢的语法实现：

library(dplyr)

df %>% group_by(COL1) %>% mutate(COL3 = fun(COL2))

## Source: local data frame [13 x 3]
## Groups: COL1 [5]
## 
##     COL1                 COL2          COL3
##    <int>                <chr>         <chr>
## 1     10        mary has life mary has life
## 2     10    Don mary has life         Don *
## 3     10 Britto mary has life      Britto *
## 4     20        push them fur          *fur
## 5     20           push them     push them 
## 6     30         yell at this  yell at this
## 7     30 this is yell at this     this is *
## 8     40                 Year          Year
## 9     40                Doggy         Doggy
## 10    40                Horse         Horse
## 11    50    This is great job     This is *
## 12    50            great job     great job
## 13    50               Donkey        Donkey

或者将它全部保存在R：

中

df$COL3 <- ave(df$COL2, df$COL1, FUN = fun)

df

##    COL1                 COL2          COL3
## 1    10        mary has life mary has life
## 2    10    Don mary has life         Don *
## 3    10 Britto mary has life      Britto *
## 4    20        push them fur          *fur
## 5    20           push them     push them 
## 6    30         yell at this  yell at this
## 7    30 this is yell at this     this is *
## 8    40                 Year          Year
## 9    40                Doggy         Doggy
## 10   40                Horse         Horse
## 11   50    This is great job     This is *
## 12   50            great job     great job
## 13   50               Donkey        Donkey

R - gsub函数

1 个答案: