正则表达式在某些可能存在或不存在的关键字之前捕获行

时间:2017-07-06 15:26:15

标签: c# .net regex string

Changeset: 8675309
User: DOMAIN\JohnG
Date: 01/21/2004 21:03:45
Comment:  This check-in fixes issues in several features.  I also refactored some items in buf.c into a new file named bif.c because buf.c was too hard to parse.
Items:
   $/baz/proj/bif.c           Added
   $/baz/proj/buf.c          Modified, Renamed
Work Items:
   34527     The "Access Denied" message is not descriptive enough.
   35628     The UI flickers when I press the '8', 'y', 'Ctrl', and 'End' buttons at the same time.
Check-in Notes:
   Code Reviewer:  ShellM
   Performance Reviewer: ShellM
   Security Reviewer: ShellM

我想捕获Items下的两行。但是工作项有时可能会丢失,因此签入备注:将停止,然后有时会丢失,所以我需要停在字符串的末尾。

(?s)(?<=Items:).*(?(?=Work Items:)|(?=Check-in Notes:))

这就是我所拥有的,并且它捕获了所有错误的登记记录。

2 个答案:

答案 0 :(得分:1)

([\s\S]*\nItems:\n)([\s\S]*?)(\nWork Items:\n[\s\S]*)?\z

这似乎有效。您的应放在第2组。

  1. ([\s\S]*\nItems:\n)这告诉正则表达式以&#34;项目开始:&#34;
  2. ([\s\S]*?)这意味着取字符,但尽可能少(非贪婪)
  3. (\nWork Items:\n[\s\S]*)?\z这告诉正则表达式使用&#34;工作项&#34;填写第三组,如果可能的话。
  4. 这会让你的第二组成为

    • 来自&#34;项目的所有内容:&#34;到EOF,或
    • 来自&#34;项目的所有内容:&#34;到&#34;工作项目:&#34; (不包括)

    关键点是第二组(您的项目)不贪婪,第三组可选。这意味着它将始终尝试匹配第三组,但后退以取得剩下的一切。

    编辑:

    在.Net中尝试此功能后,上面的正则表达式无效。但是通过小的调整(比如允许Win和* nix样式的行结尾),它可以工作。

    var pattern = @"((\n|\r|\r\n)Items:(\n|\r|\r\n))(?<Items>[\s\S]*?)((\n |\r |\r\n)Work Items:(\n |\r |\r\n)[\s\S]*)?\z";
    var regex = new Regex(pattern);
    
    var match = regex.Match(YOUR_FILE_HERE);
    var items = match.Groups["Items"].Value;
    

答案 1 :(得分:0)

尝试以下测试

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication64
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.txt";
        static List<string> sections = null;
        public enum State
        {
            NONE = -1,
            CHANGESET = 0,
            USER,
            DATE,
            COMMENT,
            ITEMS,
            WORK_ITEMS,
            CHECK_IN_NOTES
        }

        static void Main(string[] args)
        {
            sections = new List<string>() { "Changeset", "User", "Date", "Comment", "Items", "Work Items", "Check-In Notes" }; 
            string pattern = "^(?'section'[^:]+)";
            string inputLine = "";
            StreamReader reader = new StreamReader(FILENAME);
            State state = State.NONE; 
            while ((inputLine = reader.ReadLine()) != null)
            {
                inputLine = inputLine.Trim();
                Match match = Regex.Match(inputLine, pattern);
                if (match.Success)
                {
                    int index = sections.IndexOf(match.Groups["section"].Value);
                    if(index >= 0) state = (State)index;
                }

                switch(state)
                {
                    case State.COMMENT :
                        Console.WriteLine(inputLine);
                        break;
                    case State.ITEMS :
                        Console.WriteLine(inputLine);
                        break;
                }

            }
            Console.ReadLine();
        }
    }

}