Question

我有一个这样的字符串：

a b c a b“a b”b a“a”

如何匹配不属于a分隔的字符串一部分的每个"？我想在这里匹配所有大胆的内容：

a bc a b“ab”b a “a”

我想替换那些匹配（或者更确切地通过用空字符串替换它们来删除它们），因此删除引用的部分以进行匹配将不起作用，因为我希望它们保留在字符串中。我正在使用Ruby。

Answer 1

假设引号是正确平衡的并且没有转义引号，那么很容易：

result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')

当且仅当匹配的a之前存在偶数引号时，才会将所有a替换为空字符串。

<强>解释

a        # Match a
(?=      # only if it's followed by...
 (?:     # ...the following:
  [^"]*" #  any number of non-quotes, followed by one quote
  [^"]*" #  the same again, ensuring an even number
 )*      # any number of times (0, 2, 4 etc. quotes)
 [^"]*   # followed by only non-quotes until
 \Z      # the end of the string.
)        # End of lookahead assertion

如果你可以在引号（a "length: 2\""）内转义引号，它仍然可能但会更复杂：

result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')

这实质上是与上面相同的正则表达式，只用(?:\\.|[^"\\])代替[^"]：

(?:     # Match either...
 \\.    # an escaped character
|       # or
 [^"\\] # any character except backslash or quote
)       # End of alternation

Answer 2

js-coder，重新提出这个古老的问题，因为它有一个简单的解决方案，没有提到。（在为regex bounty quest进行一些研究时找到了您的问题。）

正如您所看到的，与接受的答案中的正则表达式相比，正则表达式非常小：("[^"]*")|a

subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced

请参阅此live demo

参考

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

Answer 3

针对正则表达式爱好者的全面正则表达式解决方案，无需考虑性能或代码可读性。

此解决方案假定没有转义语法（使用转义语法，a中的"sbd\"a"计入字符串内部。）

伪代码：

processedString = 
    inputString.replaceAll("\\".*?\\"","") // Remove all quoted strings
               .replaceFirst("\\".*", "") // Consider text after lonely quote as inside quote

然后，您可以在processedString中匹配所需的文字。如果您将单独引用后的文本视为外部引用，则可以删除第二个替换。

修改的

在Ruby中，上面代码中的正则表达式是

/\".*?\"/

与gsub
一起使用
和

/\".*/

与sub
一起使用

为解决更换问题，我不确定这是否可行，但值得尝试：

声明一个计数器

将正则表达式/(\"|a)/与gsub一起使用，并提供函数。

在函数中，如果匹配为"，则递增计数器，并返回"作为替换（基本上没有更改）。如果匹配是a，请检查计数器是否是偶数：如果甚至提供替换字符串;否则，只提供匹配的东西。

如何匹配不在两个特殊字符之间的正则表达式？

3 个答案: