正则表达式负向前瞻

时间:2011-02-07 12:41:57

标签: regex negative-lookahead regex-lookarounds

我正在做一些正规表达体操。我为自己设置了尝试搜索C#代码的任务,其中使用了as-operator而没有在合理的空间内进行空检查。现在我不想解析C#代码。例如。我想捕获代码片段,例如

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1.a == y1.a)

然而,不捕捉

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1 == null)

也不是那件事

    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(somethingunrelated == null) {...}
    if(x1.a == y1.a)

因此,任何随机空检查都将被视为“良好检查”,因此未找到。

问题是:我如何匹配某些内容,同时确保在其周围环境中找不到其他内容。

我尝试过天真的方法,寻找'as'然后在150个字符内做一个负面的预测。

\bas\b.{1,150}(?!\b==\s*null\b)

上述正则表达式与所有上述示例相匹配。我的直觉告诉我,问题是前瞻然后做负面预测会发现许多情况,即前瞻没有找到'== null'。

如果我尝试否定整个表达式,那么这也无济于事,因为它与大多数C#代码相匹配。

6 个答案:

答案 0 :(得分:11)

正则表演体操!这是一个注释的PHP正则表达式:

$re = '/# Find all AS, (but not preceding a XX == null).
    \bas\b               # Match "as"
    (?=                  # But only if...
      (?:                # there exist from 1-150
        [\S\s]           # chars, each of which
        (?!==\s*null)    # are NOT preceding "=NULL"
      ){1,150}?          # (and do this lazily)
      (?:                # We are done when either
        (?=              # we have reached
          ==\s*(?!null)  # a non NULL conditional
        )                #
      | $                # or the end of string.
      )
    )/ix'

这里是Javascript风格:

re = /\bas\b(?=(?:[\S\s](?!==\s*null)){1,150}?(?:(?===\s*(?!null))|$))/ig;

这个确实让我头疼了......

以下是我正在使用的测试数据:

text = r"""    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1.a == y1.a)

however, not capture
    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(x1 == null)

nor for that matter
    var x1 = x as SimpleRes;
    var y1 = y as SimpleRes;
    if(somethingunrelated == null) {...}
    if(x1.a == y1.a)"""

答案 1 :(得分:2)

.{1,150}放在前瞻中,并将.替换为\s\S(通常,.与换行符不匹配)。此外,\b可能会误导==附近。

\bas\b(?![\s\S]{1,150}==\s*null\b)

答案 2 :(得分:2)

我认为将变量名称放入()会有所帮助,因此可以将其用作后向引用。如下所示,

\b(\w+)\b\W*=\W*\w*\W*\bas\b[\s\S]{1,150}(?!\b\1\b\W*==\W*\bnull\b)

答案 3 :(得分:2)

问题不明确。你想要什么?我很遗憾,但在阅读了很多次的问题和评论后,我仍然不明白。

代码必须在C#中吗?在Python?其他?关于这一点没有任何迹象

只有当if(... == ...)行跟在var ... = ...行后面时,您才想要匹配吗?

或者,在不停止匹配的情况下,可以在块和if(... == ...)行之间使用异类线?

我的代码将第二个选项设为true。

if(... == null)行之后的if(... == ...)行是否会停止匹配?

无法理解是否为是,我定义了两个正则表达式以捕获这两个选项。

我希望我的代码足够清晰并回答您的当务之急。

是在Python中

import re

ch1 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
1618987987849891
'''

ch2 ='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
3213546878'''

ch3='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1 == null)
165478964654456454'''

ch4='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
hgyrtdduihudgug
if(x1 == null)
165489746+54646544'''

ch5='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
1354687897'''

ch6='''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
ifughobviudyhogiuvyhoiuhoiv
if(somethingunrelated == null ) {...}
if(x1.a == y1.a)
2468748874897498749874897'''

ch7 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

ch8 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
iufxresguygo
liygcygfuihoiuguyg
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

ch9 = '''kutgdfxfovuyfuuff
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)
if(somethingunrelated == null ) {...}
oufxsyrtuy
'''

pat1 = re.compile(('('
                   '(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
                   '([\s\S](?!==\s*null\\b))*?'
                   '^if *\( *[^\s=]+ *==(?!\s*null).+$'
                   ')'
                   ),
                  re.MULTILINE)

pat2 = re.compile(('('
                   '(^var +\S+ *= *\S+ +as .+[\r\n]+)+?'
                   '([\s\S](?!==\s*null\\b))*?'
                   '^if *\( *[^\s=]+ *==(?!\s*null).+$'
                   ')'
                   '(?![\s\S]{0,150}==)'
                   ),
                  re.MULTILINE)


for ch in (ch1,ch2,ch3,ch4,ch5,ch6,ch7,ch8,ch9):
    print pat1.search(ch).group() if pat1.search(ch) else pat1.search(ch)
    print
    print pat2.search(ch).group() if pat2.search(ch) else pat2.search(ch)
    print '-----------------------------------------'

结果

>>> 
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)

var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)

var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
uydtdrdutdutrr
if(x1.a == y1.a)
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
None

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
if(x1.a == y1.a)

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)

None
-----------------------------------------
var x1 = x as SimpleRes;
var y1 = y as SimpleRes;
tfsezfuytfyfy
if(x1.a == y1.a)

None
-----------------------------------------
>>> 

答案 4 :(得分:2)

让我试着重新定义你的问题:

  1. 寻找“as”赋值 - 你可能需要一个更好的正则表达式来寻找实际的赋值,并且可能想要存储已分配的表达式,但现在让我们使用“\ bas \ b”
  2. 如果您看到if (... == null)超过150个字符,则不匹配
  3. 如果您在150个字符内未看到if (... == null),请匹配
  4. 由于负面预测,您的表达式\bas\b.{1,150}(?!\b==\s*null\b)将无效。正则表达式总是可以向前或向后跳过一个字母,以避免这种负面的预测,即使存在if (... == null),你最终也会匹配。

    正则表达式真的不擅长匹配的东西。在这种情况下,您最好尝试将“as”赋值与150个字符内的“if == null”匹配匹配:

    \bas\b.{1,150}\b==\s*null\b
    

    然后否定支票:if (!regex.match(text)) ...

答案 5 :(得分:1)

(?s:\s+as\s+(?!.{0,150}==\s*null\b))

我正在使用?s:激活SingleLine选项。如果需要,您可以将它放在正则表达式的选项中。我要补充一点,我将\s放在as左右,因为我认为as周围只有空格是“合法的”。您可以将\b添加为

(?s:\b+as\b(?!.{0,150}==\s*null\b))

请注意\s可能会捕获不是“有效空格”的空格。它被定义为[\f\n\r\t\v\x85\p{Z}],其中\p{Z}Unicode Characters in the 'Separator, Space' CategoryUnicode Characters in the 'Separator, Line' CategoryUnicode Characters in the 'Separator, Paragraph' Category