Question

我需要从txt文件中读取一些日志数据，并进行相应拆分。
我的示例文件如下所示：

11:03:04.234 DEBUG event occurred  
11:03:05.345 INFO another event occurred  
11:03:06.222 ERROR notice that this event
             occupies multiple lines
             as errors can be from multiple sources
             and I have no control over this
11:04:07.222 INFO fourth event has happened

我决定不使用StreamReader，因为这似乎是最有效的方法。我使用ReadToEnd()的{{1}}方法读取文件的所有内容，并收到一个大字符串。然后，我尝试使用Regex分割字符串。到目前为止，我能想到的Regex模式如下：
StreamReader。
在Regex解析多行事件之前，它可以正常工作。
我使用this tool测试我的模式。

Answer 1

您可以使用

Regex.Split(s, @"(?m)^(?!\A)(?=\d{2}:\d{2}:\d{2}\.\d{3})")

请参见regex demo

详细信息

(?m)^-一行的开头
(?!\A)-而不是字符串的开头
(?=\d{2}:\d{2}:\d{2}\.\d{3})-后跟2位数字:，2位数字，:，2位数字，.和3位数字。

结果：

Answer 2

您可以使用此正则表达式：

(?=\d{2}:\d{2}:\d{2}\.\d{3})(?:[\s\S](?!\d{2}:\d{2}:\d{2}\.\d{3}))+

首先查看：2位数字，冒号，2位数字，冒号，2位数字，点和3位数字。

然后，它启动一个非捕获组，该组与任何字符（包括换行符）匹配，对与上述相同的模式使用否定的前瞻性。该组重复一次或多次。

基本上，它与以时间开头并一直持续到达到新的时间值（或结束）的行匹配。

MatchCollection将包含所有匹配项。

使用方法：

string text = "11:03:04.234 DEBUG event occurred\r\n11:03:05.345 INFO another event occurred\r\n11:03:06.222 ERROR notice that this event\r\noccupies multiple lines\r\nas errors can be from multiple sources\r\nand I have no control over this\r\n11:04:07.222 INFO fourth event has happened";
Regex regex = new Regex(@"(?=\d{2}:\d{2}:\d{2}\.\d{3})(?:[\s\S](?!\d{2}:\d{2}:\d{2}\.\d{3}))*");
foreach (Match match in regex.Matches(text))
{
    Console.WriteLine(match.Value);
}

正则表达式拆分模式多行

2 个答案: