我目前正在尝试开发一个软件来正确查看以.txt格式保存的WhatsApp消息(通过电子邮件发送)并尝试制作解析器。 在过去的3个小时里,我一直在试用Regex,并且没有找到解决方案,因为我之前几乎没有使用Regex。
消息如下所示:
16.08.2015, 18:30 - Person 1: Some multiline text here
still in the message
16.08.2015, 18:31 - Person 2: some other message which could be multiline
16.08.2015, 18:33 - Person 1: once again
我正在尝试通过与Regex匹配来正确拆分它们 (像这样)
List<string> messages = new List<string>();
messages = Regex.Matches(parseable, @"REGEXHERE").Cast<Match>().Select(m => m.Value).ToList();
他们就像这样结束了
messages[0]="16.08.2015, 18:30 - Person 1: Some multiline text here\nstill in the message";
messages[1]="16.08.2015, 18:31 - Person 2: some other message which could be multiline";
messages[2]="16.08.2015, 18:33 - Person 1: once again";
我一直在尝试使用非常混乱的正则表达式,看起来像\d\d\\.\d\d\\. [...]
答案 0 :(得分:0)
我不会为此使用单一的RegEx。相反,我只使用StreadReader
或StreamReader
;你必须检查当前的处理线是否是&#34;聊天开始&#34; line(对此使用RegEx),如果是,请检查以下任何行是否为&#34; chat start&#34;线条,跟踪你是否应该追加或产生一条新线。我写了一个快速扩展方法来证明这一点:
public static class ChatReader
{
static string pattern = @"\d\d\.\d\d\.\d\d\d\d, \d\d:\d\d - .*?:";
static Regex rgx = new Regex(pattern);
static string prevLine = "";
static string currLine = "";
public static IEnumerable<string> ReadChatMessages(this TextReader reader)
{
prevLine = reader.ReadLine();
currLine = reader.ReadLine();
bool isPrevChatMsg = rgx.IsMatch(prevLine);
while (currLine != null)
{
bool isCurrChatMsg = rgx.IsMatch(currLine);
if (isPrevChatMsg && isCurrChatMsg)
{
yield return prevLine;
prevLine = currLine;
}
else if (isCurrChatMsg)
{
yield return currLine;
prevLine = currLine;
}
else
{
prevLine += '\n' + currLine;
}
currLine = reader.ReadLine();
}
yield return prevLine;
}
}
可以像:
一样使用List<string> chatMessages = reader.ReadChatMessages().ToList();