用操纵模式替换多个模式

时间:2015-07-01 13:06:14

标签: regex r stringr

我有一个文本字符串,我想从

转换

text = "end back@drive@o correct back@drive@adjust@cats@do to tok"

"end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

相反,一般来说,我想替换

"a@b@c" with "a@b b@c"
"a@b@c@d" with "a@b b@c c@d"

等等。我在下面尝试使用stringr包。

patterns = unlist(str_extract_all(text, "([[:alnum:]]+@){2,}[[:alnum:]]+"))
replacements = strsplit(patterns, "@")
replacements = lapply(replacements, function(y) {
  pretuples = y[-length(y)]  
  posttuples = y[-1]
  paste(paste0(pretuples, "@", posttuples), collapse = " ")
})  
replacements = do.call(c, replacements)
str_replace_all(text, pattern = patterns, replacement = replacements)

我不认为str_replace_all是我在最后寻找的功能,当然它(合理地)返回

[1] "end back@drive drive@o correct back@drive@adjust to tok" 
[2] "end back@drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

有人可以帮我解决这个问题吗?

非常感谢。

编辑:到目前为止,回复非常有用,但它是一个我正在解析的大文件,并且不知道这个a@b@c@d...模式将链接多少次。是否有一个更通用的解决方案,不依赖于模式长度的硬编码(正如我上面尝试过的那样)?

3 个答案:

答案 0 :(得分:3)

> gsub(x = text, pattern = '@(.*?)@', replacement = '@\\1 \\1@')
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"

您需要提供更多关于您希望遇到的案例的例子,但解决方案将与上述方向相同。

在回应评论时 - 您可能需要运行链条 您的文字gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\\1 \\1@'),直到它没有变化。同样,没有更多的测试用例,我们无法确定。

答案 1 :(得分:2)

我会使用gsub

> text = "end back@drive@o correct back@drive@adjust to tok"
> gsub(pattern = "([[:alpha:]]+)@([[:alpha:]]+)@([[:alpha:]]+)", replacement = "\\1@\\2 \\2@\\3", x = text)
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"

答案 2 :(得分:1)

尝试

pat <- "(\\s|\\b)[^@]+\\s(*SKIP)(*FAIL)|(?<=@)([^@]*)(?=@)"
repl <- "\\2 \\2"
gsub(pat, repl, text, perl=TRUE)
#[1] "end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"

对于&#39; str1&#39;

gsub(pat, repl, str1, perl=TRUE)
#[1] "a@b b@c"                     "a@b b@c c@d"                
#[3] "a@b b@c c@d d@e e@f f@g g@h"

数据

text  <- "end back@drive@o correct back@drive@adjust@cats@do to tok"
str1 <- c("a@b@c", "a@b@c@d", "a@b@c@d@e@f@g@h")