正则表达式结果在C#和regex101之间是不同的

时间:2018-06-09 10:13:00

标签: c# regex text

我创建了以下正则表达式

^xy_yx_blaa_(\d+)([\s\S]*?)(^[A-D]$|QM)+[\s\S]*?(?:SW|Analyzing)

我遇到的问题是,当我运行这个是regex101的一个例子它会得到199个匹配(这就是我想要的)但是当我在我的C#程序中使用它时它只能得到55个匹配

经过进一步调查后,我发现C#程序仅匹配包含“QM”的文本,但在regex101中,它匹配包含A | B | C | D | QM

的文本

这是我目前的代码

TextExtractor extractor = new TextExtractor(path);
string text = extractor.ExtractText();
MatchCollection matches = Regex.Matches(text, pattern, RegexOptions.Multiline);

提前致谢

这是输入字符串的示例

xy_yx_blaa_184

is the act of composing and sending electronic messages, typically
consisting of alphabetic and numeric characters, between two or more
users of mobile phones, tablets, desktops/laptops, or other devices.
Text messages may be sent over a cellular network, or may also be sent
via an Internet connection.

Derived

QM

SW

xy_yx_blaa_199

is the act of composing and sending electronic messages, typically
consisting of alphabetic and numeric characters, between two or more
users of mobile phones, tablets, desktops/laptops, or other devices.
Text messages may be sent over a cellular network, or may also be sent
via an Internet connection.

Derived

A

SW

在上面的文本示例中,C#将捕获第一个(它包含QM)但在正则表达式101中它将捕获两者

1 个答案:

答案 0 :(得分:1)

在使用\r?(或其等效的$)时,您应该在任何RegexOptions.Multiline之前添加可选的(?m)模式,因为文件可能具有Windows CRLF结尾和$锚仅在\n之前匹配,即LF符号。

此外,[\s\S]更像是黑客攻击,您需要使用.RegexOptions.Singleline来匹配任何角色。

var pattern = @"^xy_yx_blaa_(\d+)(.*?)(^[A-D]\r?$|QM)+.*?(?:SW|Analyzing)";
var results = Regex.Matches(text, pattern, RegexOptions.Multiline | RegexOptions.Singleline)
    .Cast<Match>()
    .Select(m => m.Value)
    .ToList();

以下是regex demoC# demo