正则表达式聊天消息检测

时间:2015-11-16 21:45:26

标签: c# regex

我目前正在尝试开发一个软件来正确查看以.txt格式保存的WhatsApp消息(通过电子邮件发送)并尝试制作解析器。 在过去的3个小时里,我一直在试用Regex,并且没有找到解决方案,因为我之前几乎没有使用Regex。

消息如下所示:

16.08.2015, 18:30 - Person 1: Some multiline text here
still in the message
16.08.2015, 18:31 - Person 2: some other message which could be multiline
16.08.2015, 18:33 - Person 1: once again

我正在尝试通过与Regex匹配来正确拆分它们 (像这样)

List<string> messages = new List<string>();
messages = Regex.Matches(parseable, @"REGEXHERE").Cast<Match>().Select(m => m.Value).ToList();

他们就像这样结束了

messages[0]="16.08.2015, 18:30 - Person 1: Some multiline text here\nstill in the message";
messages[1]="16.08.2015, 18:31 - Person 2: some other message which could be multiline";
messages[2]="16.08.2015, 18:33 - Person 1: once again";

我一直在尝试使用非常混乱的正则表达式,看起来像\d\d\\.\d\d\\. [...]

1 个答案:

答案 0 :(得分:0)

我不会为此使用单一的RegEx。相反,我只使用StreadReaderStreamReader;你必须检查当前的处理线是否是&#34;聊天开始&#34; line(对此使用RegEx),如果是,请检查以下任何行是否为&#34; chat start&#34;线条,跟踪你是否应该追加或产生一条新线。我写了一个快速扩展方法来证明这一点:

public static class ChatReader
{
    static string pattern = @"\d\d\.\d\d\.\d\d\d\d, \d\d:\d\d - .*?:";        
    static Regex rgx = new Regex(pattern);
    static string prevLine = "";
    static string currLine = "";

    public static IEnumerable<string> ReadChatMessages(this TextReader reader)
    {
        prevLine = reader.ReadLine();
        currLine = reader.ReadLine();

        bool isPrevChatMsg = rgx.IsMatch(prevLine);                

        while (currLine != null)
        {
            bool isCurrChatMsg = rgx.IsMatch(currLine);
            if (isPrevChatMsg && isCurrChatMsg)
            {
                yield return prevLine;
                prevLine = currLine;                    
            }
            else if (isCurrChatMsg)
            {
                yield return currLine;
                prevLine = currLine;
            }
            else
            {
                prevLine += '\n' + currLine;
            }
            currLine = reader.ReadLine();

        }
        yield return prevLine;

    }
}

可以像:

一样使用
List<string> chatMessages = reader.ReadChatMessages().ToList();