Question

我有正则表达式需要将所有反斜杠\\替换为\"，除非\\介于两个美元符号 $\\bar{x}$ 之间。我不知道如何在正则表达式中替换所有这些，除非它介于这两个字符之间。

这是一个字符串和一个gsub即使在双倍美元内也可以摆脱所有\\

x <- c("I like \\the big\\ red \\dog\\ $\\hat + \\bar$, here it is $\\bar{x}$",
    "I have $50 to \\spend\\", "$\\frac{4}{5}$ is nice", "$\\30\\ is nice too") 

gsub("\\\\", "\"", x)

## > gsub("\\\\", "\"", x)
## [1] "I like \"the big\" red \"dog\" $\"hat + \"bar$, here it is $\"bar{x}$" 
## [2] "I have $50 to \"spend\""    
## [3] "$\"frac{4}{5}$ is nice"   
## [4] "$\"30\" is nice too"

我所追求的是：

## [1] "I like \"the big\" red \"dog\" $\\hat + \\bar$, here it is $\\bar{x}$" 
## [2] "I have $50 to \"spend\""
## [3] "$\\frac{4}{5}$ is nice"   
## [4] "$\"30\" is nice too"

Answer 1

如果忽略与内容相关的问题，则可以使用PCRE正则表达式进行替换。（如果不表示保留$的部分的\具有非模糊形式，则可以根据具体情况对其进行修补。

假设`$`始终开始和结束非替换区域，除了字符串中奇数最后`$`的情况。

模式（第一行是RAW正则表达式，第二行是引用字符串文字）：

\G((?:[^$\\]|\$[^$]*+\$|\$(?![^$]*+\$))*+)\\
"\\G((?:[^$\\\\]|\\$[^$]*+\\$|\\$(?![^$]*+\\$))*+)\\\\"

替换字符串：

\1"
"\\1\""

DEMO 1
DEMO 2

解释

我们的想法是找到字符串中未包含在\中的下一个$。这是通过确保匹配始终从最后一场比赛离开\G的位置开始来实现的，以确保我们不会跳过任何文字$并匹配内部的\。

我们不会取代3种形式的序列：

不是文字$或文字\：[^$\\]
2 $之间的任何文字（这不会考虑转义机制，如果有的话）：\$[^$]*+\$
允许在奇怪的最后\之后替换$：\$(?![^$]*+\$)

因此，我们只是对上述3种形式的序列进行组合，并匹配最近的\进行替换。

与上述假设相同，但`$<digit>`不会启动非替换区域。

即使使用这种字符串，这也会起作用：

I have $50 to \spend\. I just $\bar$ remembered that I have another $30 dollars $\left$ from my last \paycheck\. Lone $ \at the end\

模式：

\G((?:[^$\\]|\$\d|\$(?![^$]*\$)|\$[^$]*+\$)*+)\\
"\\G((?:[^$\\\\]|\\$\\d|\\$(?![^$]*\\$)|\\$[^$]*+\\$)*+)\\\\"

DEMO

\$\d会在\$[^$]*+\$前面添加，以便首先检查引擎。

Answer 2

使用@FrankieTheKneeMan的strsplit方法：

x <- c("I like \\the big\\ red \\dog\\ $\\hat + \\bar$, here it is $\\bar{x}$",
       "I have $50 to \\spend\\",
       "$\\frac{4}{5}$ is nice",
       "$\\30\\ is nice too") 

# > cat(x, sep='\n')
# I like \the big\ red \dog\ $\hat + \bar$, here it is $\bar{x}$
# I have $50 to \spend\
# $\frac{4}{5}$ is nice
# $\30\ is nice too

# split into parts separated by '$'.
# Add a space at the end of every string to deal with '$'
#  at the end of the string (as
#      strsplit('a$', '$', fixed=T)
#  is just 'a' in R)
bits <- strsplit(paste(x, ''), '$', fixed=T)

# apply the regex to every second part (starting with the first)
# and always to the last bit (because of the ' ' we added)
out <- sapply(bits, function (x) {
                   idx <- unique(c(seq(1, length(x), by=2), length(x)))
                   x[idx] <- gsub('\\', '\"', x[idx], fixed=T)
                   # join back together
                   x <- paste(x, collapse='$')
                   # remove that last ' ' we added
                   substring(x, 1, nchar(x) - 1)
               }, USE.NAMES=F)

# > cat(out, sep='\n')
# I like "the big" red "dog" $\hat + \bar$, here it is $\bar{x}$
# I have $50 to "spend"
# $\frac{4}{5}$ is nice
# $"30" is nice too

这将始终存在失败的情况（"I have $20. \\hi\\ Now I have $30"），因此您必须牢记这一点并针对您期望的其他格式的字符串进行测试。

除非在两点之间替换字符串

2 个答案:

假设`$`始终开始和结束非替换区域，除了字符串中奇数最后`$`的情况。

解释

与上述假设相同，但`$<digit>`不会启动非替换区域。

除非在两点之间替换字符串

2 个答案:

假设$始终开始和结束非替换区域，除了字符串中奇数最后$的情况。

解释

与上述假设相同，但$<digit>不会启动非替换区域。

假设`$`始终开始和结束非替换区域，除了字符串中奇数最后`$`的情况。

与上述假设相同，但`$<digit>`不会启动非替换区域。