使用Regex解析文本文件

时间:2019-06-17 16:27:20

标签: c# regex parsing

我有一个文本文件

    ;   Message Number
    ;   |         Time Offset (ms)
    ;   |         |        Type
    ;   |         |        |        ID (hex)
    ;   |         |        |        |     Data Length
    ;   |         |        |        |     |   Data Bytes (hex) ...
    ;   |         |        |        |     |   |
    ;---+--   ----+----  --+--  ----+---  +  -+ -- -- -- -- -- -- --
         1)         2.0  Rx         0400  8  01 5A 01 57 01 D2 A6 02 
         2)         8.6  Rx         0500  8  02 C1 02 C9 02 BE 02 C2 
         3)        36.2  Rx         0401  8  01 58 01 59 01 01 01 01 
         4)        41.7  Rx         01C4  8  27 9C 64 8C 00 03 E8 08 
         5)        43.1  Rx         0501  8  02 C0 02 C1 02 C6 02 C0 
         6)        62.7  Rx         01C2  8  27 9C 60 90 00 0F 04 08 

,我正在尝试仅从此文件中收集ID。我已经有了表达式并且已经测试了它的工作原理,但是当我尝试收集列表时,它会给我整行而不只是ID。

        var ofd = new OpenFileDialog
        {
            Filter = "TRC File (*.trc*)|*.trc*",
            Multiselect = true,
        };

        ofd.ShowDialog();

        string path = ofd.FileName;
        List<string> alllinesText = File.ReadAllLines(path).ToList();
        foreach (string id in alllinesText)
        {
            Regex rx = new Regex(@"\d\d[\d|\w][\d|\w]\s\s");
            Console.Write(id.ToString());
            MatchCollection matches1 = rx.Matches(id);
            Console.WriteLine(matches1);

        }

        foreach (string data in alllinesText)
        {
            Regex rx2 = new Regex(@"[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w].[\w\d][\d\w]");
            Console.Write(data.ToString());
            MatchCollection matches2 = rx2.Matches(data);
        }

输出为

     28817)    347963.1  Rx         01C2  8  01 00 00 00 00 00 00 6F System.Text.RegularExpressions.MatchCollection
     28818)    347966.3  Rx         04E2  8  64 04 10 15 F5 00 00 08 System.Text.RegularExpressions.MatchCollection
     28819)    347967.2  Rx         01C4  8  27 14 63 8C 00 03 E7 08 System.Text.RegularExpressions.MatchCollection
     28820)    348017.0  Rx         03C4  8  7F 8A 7F 80 7F FA 96 0F System.Text.RegularExpressions.MatchCollection
     28821)    348023.1  Rx         0405  8  01 57 01 58 01 DB 93 02 System.Text.RegularExpressions.MatchCollection
     28822)    348029.6  Rx         0505  8  02 BB 02 BC 02 BD 02 BF System.Text.RegularExpressions.MatchCollection

1 个答案:

答案 0 :(得分:0)

我的猜测是,我们可能只想在char类中添加一个捕获组,也许类似于:

([A-Z0-9]{4})

RegEx Demo

测试

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"([A-Z0-9]{4})";
        string input = @" ;   Message Number
    ;   |         Time Offset (ms)
    ;   |         |        Type
    ;   |         |        |        ID (hex)
    ;   |         |        |        |     Data Length
    ;   |         |        |        |     |   Data Bytes (hex) ...
    ;   |         |        |        |     |   |
    ;---+--   ----+----  --+--  ----+---  +  -+ -- -- -- -- -- -- --
         1)         2.0  Rx         0400  8  01 5A 01 57 01 D2 A6 02 
         2)         8.6  Rx         0500  8  02 C1 02 C9 02 BE 02 C2 
         3)        36.2  Rx         0401  8  01 58 01 59 01 01 01 01 
         4)        41.7  Rx         01C4  8  27 9C 64 8C 00 03 E8 08 
         5)        43.1  Rx         0501  8  02 C0 02 C1 02 C6 02 C0 
         6)        62.7  Rx         01C2  8  27 9C 60 90 00 0F 04 08 ";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

C# Demo