使用正则表达式单独替换分隔符的开头和结尾

时间:2014-12-14 17:09:34

标签: regex r

我想替换以下表达式中的$分隔符。

s <- "something before stuff $some text$ in between $1$ and after"

开头和结尾的替换必须是不同的,即

begin <- "<B>"     # replacement for 1st delimiter   
end <- "<E>"       # replacement for 2nd delimiter   

结果应为

str_replace_all(s, SOME-REGEX-MAGIC)    
> [1] "something before stuff <B>some text<E> in between <B>1<E> and after"

我不是一名正则表达式专家,无法弄清楚如何分别处理分隔符的开头和结尾。

有什么想法吗?谢谢你的时间!

不成功的想法

仅仅为了记录我完全不成功的想法,以接近解决方案:

# Using lookarounds I get the following, but I would need it to be non-greedy
str_extract(s, perl("(?<=\\$).*(?=\\$)"))
"some text$ and some more $1"

# also greedy
str_match(s, "(\\$)(.*)(\\$)")
     [,1]                            [,2] [,3]                          [,4]
[1,] "$some text$ and some more $1$" "$"  "some text$ and some more $1" "$" 

2 个答案:

答案 0 :(得分:3)

将此正则表达式与gsub()一起使用。替换使用反向引用(例如\\1)。

ptn <- "\\$(.*?)\\$" # Non-greedy find between delimiters
replacement <- "<B>\\1<E>"  # \\1 indicates back-reference
gsub(ptn, replacement, s)
[1] "something before stuff <B>some text<E> in between <B>1<E> and after"

后向引用\\1表示正则表达式中的第一个通配符表达式 - 即parens中的字符串 - (.*?)?修饰符使匹配变得非贪婪。

答案 1 :(得分:1)

使用非贪婪的算子?

\\$(.*?)\\$

\\$([^$]*)\\$