在空新行之前匹配行

时间:2014-07-20 07:50:14

标签: c# regex

输入如下:

0
00:00:00,000 --> 00:00:00,000
Hello world!

1
00:00:00,000 --> 00:00:00,000
Hello world!
This is my new world.

2
00:00:00,000 --> 00:00:00,000
Hello guys!

使用清晰快速的正则表达式,我想将其拆分为:

Match 1: `0`
Match 2: `00:00:00,000 --> 00:00:00,000`
Match 3: `Hello world!`

Match 1: `1`
Match 2: `00:00:00,000 --> 00:00:00,000`
Match 3: `Hello world!
This is my new world.`

Match 1: `2`
Match 2: `00:00:00,000 --> 00:00:00,000`
Match 3: `Hello guys!`

我使用(\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)[\n\r].+进行匹配,但问题是它与两行文字或更多不匹配(在上面示例的第2组中匹配3)。

注意:如果您在不使用正则表达式的情况下知道一种具有良好可读性和更好性能的方法,请随时向我提供。

谢谢,
阿里

3 个答案:

答案 0 :(得分:2)

嗯,这是一种非正则表达方法:

public IEnumerable<List<string>> ReadSeparatedLines(string file)
{
    List<string> lines = new List<string>();
    foreach (var line in File.ReadLines(file))
    {
        if (line == "")
        {
            // Only take action if we've actually got something to return. This
            // handles files starting with blank lines, and also files with
            // multiple consecutive blank lines.
            if (lines.Count > 0)
            {
                yield return lines;
                lines = new List<string>();
            }
        }
        else
        {
            lines.Add(line);
        }
    }
    // Check whether we had any trailing lines to return
    if (lines.Count > 0)
    {
        yield return lines;
    }
}

个人发现比正则表达式更容易理解,但当然你可能有不同的品味。

答案 1 :(得分:1)

您可以使用以下正则表达式

/(\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)(.*?)(?=\n\n|$)/sg

DEMO

答案 2 :(得分:0)

你去吧

(\d+)[\n\r]([\d:,]+\s-->\s[\d:,]+)[\n](.+(?:[\n]*[^\d|^\n]+)*)

结果

MATCH 1

  1. [0-1] 0

  2. [2-31] 00:00:00,000 --> 00:00:00,000

  3. [32-44] Hello world!

  4. MATCH 2

    1. [46-47] 1

    2. [48-77] 00:00:00,000 --> 00:00:00,000

    3. [78-112] Hello world! This is my new world.

    4. MATCH 3

      1. [114-115] 2

      2. [116-145] 00:00:00,000 --> 00:00:00,000

      3. [146-157] Hello guys!

      4. 尝试regex101.com

        修改

        我确实尝试更新数字的正则表达式,所以现在它匹配多行,数字在需要的范围内。现在它看起来有点短暂

        (\d+)[\n](.*?)\n((?s).*?)(?=\n\n\d|\Z)
        

        此正则表达式匹配以下

        0
        00:00:00,000 --> 00:00:00,000
        Hello world!
        
        1
        00:00:00,000 --> 00:00:00,000
        Hello world!
        This is my new world.
        
        2
        00:00:00,000 --> 00:00:00,000
        Hello guys!
        This line contains 123457!
        This is third line!
        And more lines!
        

        作为

        MATCH 1

        1. [0-1] 0

        2. [2-31] 00:00:00,000 --> 00:00:00,000

        3. [32-44] Hello world!

        4. MATCH 2

          1. [46-47] 1

          2. [48-77] 00:00:00,000 --> 00:00:00,000

          3. [78-112] Hello world! This is my new world.

          4. MATCH 3

            1. [114-115] 2

            2. [116-145] 00:00:00,000 --> 00:00:00,000

            3. [146-220] Hello guys! This line contains 123457! This is third line! And more lines!

            4. 尝试regex101.com