Question

我正在解析日志文件以识别和检索有关失败的信息。正则表达式似乎是解决这个问题的正确方法。

这是我的初始模式：\d{4}-\d{2}-\d{2} \d{2}.*

这适用于这样的单行：

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0

这不适用于跨越多行的信息。

2011-02-06 02:19:04.4087|FATAL|ClassName|Message  
Failure data  
Additional message |StackLine:0:0

以下是日志中的几行：

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0
4th StackLine:0:0  
3rd StackLine:0:0  
2nd StackLine:0:0  
1st StackLine:0:0 

 2011-02-06 02:19:04.4087|FATAL|ClassName|Message  
Failure data  
Additional message |7th StackLine:0:0  
6th StackLine:0:0  
5th StackLine:0:0  
4th StackLine:0:0  
3rd StackLine:0:0  
2nd StackLine:0:0  
1st StackLine:0:0

短语“StackLine”表示转储调用堆栈中的方法签名。例如，这里有两个不同的“StackLine”示例：

ExecuteCodeWithGuaranteedCleanup at offset 0 in file:line:column <filename unknown>:0:0

和

OnXmlMsgReceived at offset 128 in file:line:column d:\buildserver\source\svnroot\DepotManager\trunk\src\DepotManager.Core\Gating\AutoGate\Wherenet\Zla\EventSink.cs:115:17

在一个理想的世界中，我会从第一行的时间戳开始获取该行：字符表示法（通常为0：0）。

我如何创建一个与两者相匹配的模式？

Answer 1

这将匹配以日期开头的行及其后面的所有不以日期开头的行。

^\d{4}-\d{2}-\d{2} \d{2}.*$(?:\n(?!\d{4}-\d{2}-\d{2}).*)*

这是一个Rubular示例： http://www.rubular.com/r/1BIoLZ5tfs

编辑2 ：如果您想要停在第一个:0:0，只要您启用了多行选项，就可以使用以下正则表达式，以便{{1} }字符也将匹配换行符：

这是一个新的Rubular：http://www.rubular.com/r/rfR1wqDHR8

Answer 2

var log = @"2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0 4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0
1st StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StaackLine:0:0
5th StaackLine:0:0
4th StaackLine:0:0
3rd StaackLine:0:0
2nd StaackLine:0:0
1st StaackLine:0:0";
var regex = @"\d{4}-\d{2}-\d{2}\s\d{2}.*?";
var matches = Regex.Matches(log, regex);
var count = matches.Count; // count = 4

Answer 3

这是一个匹配所有行的正则表达式：
\d{4}-\d{2}-\d{2} \d{2}[\S\s]*

你的正则表达式不起作用的原因是，因为点修饰符很少用作“匹配所有东西”

Answer 4

PCRE已modifiers，您需要PCRE_DOTALL。你没有指定一种语言，所以我不能给你一个PHP示例：preg_match('/\d{4}-\d{2}-\d{2} \d{2}.*/s'

Answer 5

var rx = new Regex(@"^\d{4}-\d{2}-\d{2} \d{2}[\s\S]*?$^\s*$", 
                   RegexOptions.Multiline);

var matches = rx.Matches(yourText);

请注意，使用\d您可以捕获非欧洲数字，但考虑到您的文件格式非常“固定”，您应该没有任何问题（\d捕获所有这些： Unicode Characters in the 'Number, Decimal Digit' Category）

只有在每个“日志”末尾都有一个空行时，这才有效。即使最后一个日志必须有一个空行，因此格式必须是

2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine
secondary line of the previous line
(blank)
2011-02-06 02:17:56.9886|FATAL|ClassName|Failure data|StackLine
(blank)

RegEx模式匹配一行或多行

5 个答案:

RegEx模式匹配一​​行或多行

5 个答案:

RegEx模式匹配一行或多行