正则表达式解析Int和String

时间:2016-06-27 22:49:41

标签: c# regex

所以我创建了这个正则表达式来解析这样的字符串(我需要Byte和Time的值):

1463735418    Bytes: 0    Time: 4.297 

这是下面的代码(使用this

string writePath = @"C:\final.txt";
string[] lines = File.ReadAllLines(@"C:\union.dat");
foreach (string txt in lines)
{

    string re1 = ".*?"; // Non-greedy match on filler
    string re2 = "\\d+";    // Uninteresting: int
    string re3 = ".*?"; // Non-greedy match on filler
    string re4 = "(\\d+)";  // Integer Number 1
    string re5 = ".*?"; // Non-greedy match on filler
    string re6 = "([+-]?\\d*\\.\\d+)(?![-+0-9\\.])";    // Float 1

    Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6, RegexOptions.IgnoreCase | RegexOptions.Singleline);
    Match m = r.Match(txt);
    if (m.Success)
    {
        String int1 = m.Groups[1].ToString();
        String float1 = m.Groups[2].ToString();
        Debug.Write("(" + int1.ToString() + ")" + "(" + float1.ToString() + ")" + "\n");
        File.AppendAllText(writePath, int1.ToString() + ", " + float1.ToString() + Environment.NewLine);

    }
}

当字符串表示为一行时,这非常有效,但是当我尝试读取这样的文件时。

1463735418
Bytes: 0
Time: 4.297
1463735424
Time: 2.205
1466413696
Time: 2.225
1466413699
1466413702
1466413705
1466413708
1466413711
1466413714
1466413717
1466413720
Bytes: 7037
Time: 59.320
... (arbritrary repition)

我收到了垃圾数据。

Expected Output: 
0, 4.297 
7037, 59.320

(仅在存在时间字节对的情况下匹配)

编辑:我正在尝试这样的事情,但我仍然没有得到理想的结果。

foreach (string txt in lines)
            {

                if (txt.StartsWith("Byte"))
                {
                    string re1 = ".*?"; // Non-greedy match on filler
                    string re2 = "(\\d+)";  // Integer Number 1

                    Regex r = new Regex(re1 + re2, RegexOptions.IgnoreCase | RegexOptions.Singleline);
                    Match m = r.Match(txt);
                    if (m.Success)
                    {
                        String int1 = m.Groups[1].ToString();
                        //Console.Write("(" + int1.ToString() + ")" + "\n");
                        httpTable += int1.ToString() + ",";
                    }
                }
                if (txt.StartsWith("Time"))
                {
                    string re3 = ".*?"; // Non-greedy match on filler
                    string re4 = "([+-]?\\d*\\.\\d+)(?![-+0-9\\.])";    // Float 1

                    Regex r1 = new Regex(re3 + re4, RegexOptions.IgnoreCase | RegexOptions.Singleline);
                    Match m1 = r1.Match(txt);
                    if (m1.Success)
                    {
                        String float1 = m1.Groups[1].ToString();
                        //Console.Write("(" + float1.ToString() + ")" + "\n");
                        httpTable += float1.ToString() + Environment.NewLine;
                    }
                }

            }

我该如何修补? 感谢。

2 个答案:

答案 0 :(得分:2)

我建议lookbehind限定时间和字节,如果没有找到默认值到整数类别。然后使用正则表达式命名捕获确定每个匹配的内容。

string data = "1463735418 Bytes: 0 Time: 4.297 1463735424 Time: 2.205 1466413696 Time: 2.225 1466413699 1466413702 1466413705 1466413708 1466413711 1466413714 1466413717 1466413720 Bytes: 7037 Time: 59.320";

string pattern = @"
    (?<=Bytes:\s)(?<Bytes>\d+)   # Lookbehind for the bytes
    |                            # Or
    (?<=Time:\s)(?<Time>[\d.]+)  # Lookbehind for time
    |                            # Or
    (?<Integer>\d+)              # most likely its just an integer.
    ";

Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .Select(mt => new
                   {
                       IsInteger = mt.Groups["Integer"].Success,
                       IsTime = mt.Groups["Time"].Success,
                       IsByte = mt.Groups["Bytes"].Success,
                       strMatch = mt.Groups[0].Value,
                       AsInt  = mt.Groups["Integer"].Success ? int.Parse(mt.Groups["Integer"].Value) : -1,
                       AsByte = mt.Groups["Bytes"].Success ? int.Parse(mt.Groups["Bytes"].Value) : -1,
                       AsTime = mt.Groups["Time"].Success ? double.Parse(mt.Groups["Time"].Value) : -1.0,
                   })

这是一个结果,它是每个匹配的IEnumerable,作为一个动态实体,有三个IsA s及其相应的As转换值,如果可行:

enter image description here

答案 1 :(得分:0)

由于您只需要>> ?* => "*" >> ?a => "a" >> ?1 => "1" >> ?8 => "8" >> ?83 SyntaxError: (irb):32: syntax error, unexpected '?' from /usr/local/bin/irb:11:in `<main>' >> ?ab SyntaxError: (irb):33: syntax error, unexpected '?' from /usr/local/bin/irb:11:in `<main>' Bytes: ...的值,请使用完全字符串,而不是填充符:

用于捕获Time: ...

Bytes

用于捕获Bytes: (\d+)

Time

捕获两者的通用模式

Time: ([-+]\d*\.\d+)