Question

我想提取最接近某个部分的数字。在这个正则表达式中 \d+?[\r\n]+(.*)3.2.P.4.4.\s+Justification\s+of\s+Specifications

目标-尝试查找以数字开头并以给定节名称结尾的节。在这种情况下，节名称为（3.2.P.4.4。规格）

实际结果-由于模式以数字开头，因此正则表达式匹配所有内容。预期的结果-Regex应该从29开始，这是到该节之前最接近的数字。我尝试了许多选项，例如贪婪的量词等，但似乎都没有用。

Answer 1

您可以使用否定的前瞻来断言下一行不是以空格字符开头，后跟数字和换行符：

^ \d+[\r\n](?:(?!\s+\d+[\r\n]).*[\r\n])*3\.2\.P\.4\.4\.\sJustification\s+of\s+Specifications

查看regex .NET demo | C# demo

说明

^字符串的开头
\d+[\r\n]匹配空格，1个以上的数字和换行符
(?:非捕获组
- (?!否定断言来断言以下内容不是
  - \s+\d+[\r\n]匹配1个以上空格字符，1个以上数字和换行符
- )近距离否定预测
- .*[\r\n]匹配所有以换行符结尾的字符
)*关闭非捕获组并重复0次以上
3\.2\.P\.4\.4\.\sJustification\s+of\s+Specifications匹配部分名称

Answer 2

在.NET中，您可以使用RegexOptions.RightToLeft选项从结尾到开头解析文本，从而以更快的速度和更简单的方式获得最后的匹配。

使用

var text = " 26\r\nData related to the point SP-WFI-21-Room process fluids  \r\nSampling Date:16/04/2007 \r\n 28\r\nData related to pint SP-WFI-21-Room process fluids  \r\nSampling Date: 20/04/2007 \r\nTEST SPECIFICATIONS RESULTS \r\n 29\r\n3.2.P.4.2 Analytical Procedures \r\nAll the analytical procedures \r\n3.2.P.4.3 Validation of Analytical Procedures \r\nAll the analytical procedures proposed to control the excipients are those reported in Ph. Eur. \r\n− 3AQ13A: Validation of Analytical Procedures: Methodology - EUDRALEX Volume 3A \r\n3.2.P.4.4. Justification of Specifications";
var pattern = @"^\s*\d+\s*[\r\n]+(.*?)3\.2\.P\.4\.4\.\s+Justification\s+of\s+Specifications";
var regEx = new Regex(pattern, RegexOptions.RightToLeft | RegexOptions.Singleline | RegexOptions.Multiline );

var m = regEx.Match(text);
if (m.Success)
{
    Console.WriteLine(m.Groups[1].Value);
}

请参见C# demo。

请参见.NET regex demo

我基本上只是在^之后添加了\s*（在多行模式下，一行的开始）和\d+（以防换行符之前有空格）。注意转义的点。

请注意，.NET正则表达式不支持U贪婪切换修饰符，因此必须将+?转换为+，并将.*转换为.*?。实际上，原始正则表达式中有+个量词本来是+?，这可能导致其他错误或意外行为。 如果您不确定100％正在进行的操作，请不要在PCRE中使用U修饰符。

如何使用.NET匹配正则表达式中的最后一个模式？

2 个答案: