正则表达式:替换第n次出现

时间:2013-05-28 10:21:01

标签: r

有人知道如何找到表达式中字符串的第n个出现以及如何用正则表达式替换它吗?

例如我有以下字符串

txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"

我希望用“|”代替' - '的第五次出现 和“||”的第7次出现像

[1] aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa

我该怎么做?

谢谢, 弗洛里安

3 个答案:

答案 0 :(得分:7)

(1)sub 可以使用sub在单个正则表达式中完成:

> sub("(^(.*?-){4}.*?)-(.*?-.*?)-", "\\1|\\3||", txt, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

(2)sub two 或此变体调用sub两次:

> txt2 <- sub("(^(.*?-){6}.*?)-", "\\1|", txt, perl = TRUE)
> sub("(^(.*?-){4}.*?)-", "\\1||", txt2, perl = TRUE)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

(3)sub.fun 或此变体会创建一个替代的函数sub.fun。它利用gsubfn package中的fn$n-1patvalue替换为sub参数。首先定义指示的函数,然后调用它两次。

library(gsubfn)
sub.fun <- function(x, pat, n, value) {
   fn$sub( "(^(.*?-){`n-1`}.*?)$pat", "\\1$value", x, perl = TRUE)
}

> sub.fun(sub.fun(txt, "-", 7, "||"), "-", 5, "|")
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

(我们可以使用subsub.fun修改paste正文中sprintf的参数,以提供基本R解决方案,但代价是额外费用详细程度。)

这可以作为替代函数重新表述,给出了令人满意的顺序:

"sub.fun<-" <- sub.fun
tt <- txt # make a copy so that we preserve the input txt
sub.fun(tt, "-", 7) <- "||"
sub.fun(tt, "-", 5) <- "|"

> tt
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

(4)gsubfn 使用gsubfn package中的gsubfn我们可以使用一个特别简单的正则表达式(它只是"-")并且代码相当直截了当的结构。我们通过proto方法执行替换。传递包含该方法的proto对象来代替替换字符串。这种方法的简单性源自gsubfn自动使count变量可用于此类方法的事实:

library(gsubfn) # gsubfn also pulls in proto
p <- proto(fun = function(this, x) {
     if (count == 5) return("|")
     if (count == 7) return("||")
     x
 })

> gsubfn("-", p, txt)
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"

更新:一些更正。

更新2:为(3)添加了替换函数方法。

更新3:向pat添加了sub.fun参数。

答案 1 :(得分:4)

另一种可能性是使用Hadley的stringr包,它构建了我写的函数的基础:

require(stringr)

replace.nth <- function(string, pattern, replacement, n) {
    locations <- str_locate_all(string, pattern)
    str_sub(string, locations[[1]][n, 1], locations[[1]][n, 2]) <- replacement
    string
}

txt <- "aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa-aaa"

txt.new <- replace.nth(txt, "-", "|", 5)
txt.new <- replace.nth(txt.new, "-", "||", 7)
txt.new
# [1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa-aaa||aaa-aaa"

答案 2 :(得分:1)

执行此操作的一种方法是使用gregexpr查找-的位置:

posns <- gregexpr("-",txt)[[1]]

然后粘贴相关的碎片和分隔符:

paste0(substr(txt,1,posns[5]-1),"|",substr(txt,posns[5]+1,posns[7]-1),"||",substr(txt,posns[7]+1,nchar(txt)))
[1] "aaa-aaa-aaa-aaa-aaa|aaa-aaa||aaa-aaa-aaa"