RegEx匹配模式,只要它没有前面的不同模式

时间:2010-11-02 19:17:28

标签: c# .net regex

我需要一个用于文本替换的正则表达式。示例:要匹配的文本为ABC(可以用方括号括起来),替换文本为DEF。这很基本。复杂的是,我希望匹配ABC文本,前面是模式\[[\d ]+\]\. - 换句话说,当它前面有一个单词或括号中的单词集,后跟一段时间。

以下是要匹配的源文本的一些示例,以及在进行正则表达式替换后的结果:

1. [xxx xxx].[ABC] > [xxx xxx].[ABC] (does not match - first part fits the pattern)
2. [xxx xxx].ABC   > [xxx xxx].ABC   (does not match - first part fits the pattern)
3. [xxx.ABC        > [xxx.DEF        (matches - first part has no closing bracket)
4. [ABC]           > [DEF]           (matches - no first part)
5. ABC             > DEF             (matches - no first part)
6. [xxx][ABC]      > [xxx][DEF]      (matches - no period in between)
7. [xxx]. [ABC]    > [xxx] [DEF]     (matches - space in between)

它归结为:如何指定上述模式,如上所述,阻止匹配?在这种情况下,模式会是什么? (正则表达式的C#风格)

1 个答案:

答案 0 :(得分:14)

你想要一个负面的后视表达式。它们看起来像(?<!pattern),所以:

(?<!\[[\d ]+\]\.)\[?ABC\]?

请注意,这不会强制ABC周围的一对匹配的方括号;它只允许一个可选的开放式支架和一个可选的紧密支架。如果你想强制匹配一对或没有,你必须使用交替:

(?<!\[[\d ]+\]\.)(?:ABC|\[ABC\])

这使用非捕获括号来分隔交替。如果你想真正捕获ABC,你可以把它变成一个捕获组。

ETA:第一个表达式似乎失败的原因是它在ABC]上匹配,后面没有禁止的文本。开括号[是可选的,因此它与此不匹配。解决这个问题的方法是将可选的开放式括号[转换为负面的后置断言,如下所示:

(?<!\[[\d ]+\]\.\[?)ABC\]?

匹配和不匹配的示例:

[123].[ABC]: fail (expected: fail)
[123 456].[ABC]: fail (expected: fail)
[123.ABC: match (expected: match)
    matched: ABC
ABC: match (expected: match)
    matched: ABC
[ABC]: match (expected: match)
    matched: ABC]
[ABC[: match (expected: fail)
    matched: ABC

尝试使开放式括号[存在强制匹配的近括号],如第二种模式所示,更加棘手,但这似乎有效:

(?:(?<!\[[\d ]+\]\.\[)ABC\]|(?<!\[[\d ]+\]\.)(?<!\[)ABC(?!\]))

匹配和不匹配的示例:

[123].[ABC]: fail (expected: fail)
[123 456].[ABC]: fail (expected: fail)
[123.ABC: match (expected: match)
    matched: ABC
ABC: match (expected: match)
    matched: ABC
[ABC]: match (expected: match)
    matched: ABC]
[ABC[: fail (expected: fail)

使用以下代码生成示例:

// Compile and run with: mcs so_regex.cs && mono so_regex.exe
using System;
using System.Text.RegularExpressions;

public class SORegex {
  public static void Main() {
    string[] values = {"[123].[ABC]", "[123 456].[ABC]", "[123.ABC", "ABC", "[ABC]", "[ABC["};
    string[] expected = {"fail", "fail", "match", "match", "match", "fail"};
    string pattern = @"(?<!\[[\d ]+\]\.\[?)ABC\]?";  // Don't force [ to match ].
    //string pattern = @"(?:(?<!\[[\d ]+\]\.\[)ABC\]|(?<!\[[\d ]+\]\.)(?<!\[)ABC(?!\]))";  // Force balanced brackets.
    Console.WriteLine("pattern: {0}", pattern);
    int i = 0;
    foreach (string text in values) {
      Match m = Regex.Match(text, pattern);
      bool isMatch = m.Success;
      Console.WriteLine("{0}: {1} (expected: {2})", text, isMatch? "match" : "fail", expected[i++]);
      if (isMatch) Console.WriteLine("\tmatched: {0}", m.Value);
    }
  }
}