Question

我在日常工作中处理了很多正则表达式。听起来很奇怪，有时我甚至使用RegEx来编辑/修复/格式化我的RegEx表达式。然而，这是一个困扰我的问题。如何正确捕获转义字符，并且只捕获真正转义过的字符？

字符串列表：

this is a test
this is a te\st
this is a te\\st
this is a te\\\st
this is a te\\\\st
this is a te\\\\\st
this is a te\\\\\\st

如果我只想匹配's'是（或不是）字符类（即空格）的那些，我该怎么做？

说明：

this is a test       = test
this is a te\st      = te \s t
this is a te\\st     = te \\ st
this is a te\\\st    = te \\ \s t
this is a te\\\\st   = te \\ \\ st
this is a te\\\\\st  = te \\ \\ \s t
this is a te\\\\\\st = te \\ \\ \\ st

您不能简单地使用[^\\]s或(?<!\\)s。我尝试了多种组合，没有成功。我如何抓住：

this is a test
this is a te\\st
this is a te\\\\st
this is a te\\\\\\st

和/或相反：

this is a te\st
this is a te\\\st
this is a te\\\\\st

我尝试过的变化。。

.*(?<=(?<!\\)(?<=(\\\\)+))st.*
.*((?<=(?<!\\)(\\\\)+)|(?<!\\))st.*

编辑：这需要是动态长度。

Answer 1

我会使用这样的东西来获得所有'真正的's：

(?<!\\)(?:\\.|[^\\\n])*?(s)

regex101 demo

这样可以获得所有转义的s：

(?<!\\)(?:\\.|[^\\\n])*?(\\s)

regex101 demo

Answer 2

如果你的正则表达式引擎支持无限制的lookbehind，你可以写：

(?<=(?:^|[^\\])(?:\\\\)*)\\s

匹配前面带有字符串开头或非反斜杠字符的\s加上偶数个反斜杠。

但我通常使用的方法是匹配 \\ 或我感兴趣的任何转义序列，然后编写一个有效的替换表达式对于这两种情况。例如，在JavaScript中：

var result =
     input.replace(/\\[\\s]/g, function ($0) {
         if ($0 === '\\\\') {
             return '\\\\';
         } else {
             ...
         }
     });

RegExing RegEx表达式：逃避问题？

2 个答案: