正则表达式 - 根据条件捕获每一行

时间:2017-07-10 20:17:04

标签: c# regex

重新审视我一年多前在这里遇到的解决方案:

/* ----------------- jobnameA ----------------- */ 

insert_job: jobnameA   job_type: CMD 
date_conditions: 0
alarm_if_fail: 1


/* ----------------- jobnameB ----------------- */ 

insert_job: jobnameB   job_type: CMD 
date_conditions: 1
days_of_week: tu,we,th,fr,sa
condition: s(job1) & s(job2) & (v(variable1) = "Y" | s(job1)) & (v(variable2) = "Y" 
alarm_if_fail: 1
job_load: 1
priority: 10


/* ----------------- jobnameC ----------------- */ 
...

我使用以下正则表达式捕获在条件参数中使用变量 v(x)的每个作业(此处只有jobnameB匹配):

(?ms)(^[ \t]*/\*[\s-]*([\w-]*)[\s-]*\*/)((?:(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*?condition\: ([^\n\r]*v\([^\n\r]*)[ \t]*\))+(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*)

我现在需要将每一行作为参数和值组捕获,同时满足相同的条件。

这个正则表达式会将每行的参数和值作为单独的捕获组,但这不会考虑变量 v(x)的存在,因此它会抓取所有作业:

(?:^([\w_]*\:) ([^\n]+))

并且,以下表达式将使我获得令人满意的作业的第一行(insert_job),但它在那里结束而不是抓取所有参数。

(?:^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/)(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*?(?:^([\w_]*\:) ([^\n]+))

任何进一步的帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

我已经解析了40多年的文本文件了。如果我做不到,没人能。我试了一段时间使用Regex来分割你的'name:value'输入但是没有成功。所以我终于写了自己的方法。看看我在一周中所做的事情

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.txt";
        static void Main(string[] args)
        {
            Job.Load(FILENAME);
        }
    }
    public class Job
    {
        public static List<Job> jobs = new List<Job>();

        public string name { get;set;}
        public string job_type { get;set;}
        public int date_conditions { get; set;}

        public DayOfWeek[] days_of_week { get; set; }
        public string condition { get; set; }

        public int alarm_if_fail { get; set; }
        public int job_load { get; set; }
        public int priority { get; set;}


        public static void Load(string filename)
        {
            Job newJob = null;
            StreamReader reader = new StreamReader(filename);
            string inputLine = "";
            while ((inputLine = reader.ReadLine()) != null)
            {
                inputLine = inputLine.Trim();
                if ((inputLine.Length > 0) && (!inputLine.StartsWith("/*")))
                {
                    List<KeyValuePair<string, string>> groups = GetGroups(inputLine);

                    foreach (KeyValuePair<string, string> group in groups)
                    {
                        switch (group.Key)
                        {
                            case "insert_job" :
                                newJob = new Job();
                                Job.jobs.Add(newJob);
                                newJob.name = group.Value;
                                break;

                            case "job_type":
                                newJob.job_type = group.Value;
                                break;

                            case "date_conditions":
                                newJob.date_conditions = int.Parse(group.Value);
                                break;

                            case "days_of_week":
                                List<string> d_of_w = new List<string>() { "su", "mo", "tu", "we", "th", "fr", "sa" };
                                newJob.days_of_week = group.Value.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries).Select(x => (DayOfWeek)d_of_w.IndexOf(x)).ToArray();
                                break;

                            case "condition":
                                newJob.condition = group.Value;
                                break;

                            case "alarm_if_fail":
                                newJob.alarm_if_fail = int.Parse(group.Value);
                                break;

                            case "job_load":
                                newJob.job_load = int.Parse(group.Value);
                                break;

                            case "priority":
                                newJob.priority = int.Parse(group.Value);
                                break;

                        }
                    }
                }
            }

            reader.Close();
        }
        public static List<KeyValuePair<string, string>> GetGroups(string input)
        {
            List<KeyValuePair<string, string>> groups = new List<KeyValuePair<string, string>>();
            string inputLine = input;
            while(inputLine.Length > 0)
            {
                int lastColon = inputLine.LastIndexOf(":");
                string value = inputLine.Substring(lastColon + 1).Trim();
                int lastWordStart = inputLine.Substring(0, lastColon - 1).LastIndexOf(" ") + 1;
                string name = inputLine.Substring(lastWordStart, lastColon - lastWordStart);

                groups.Insert(0, new KeyValuePair<string,string>(name,value));
                inputLine = inputLine.Substring(0, lastWordStart).Trim();
            }
            return groups;
        }

    }
}

答案 1 :(得分:1)

我认为如果你把它分解成步骤会更容易。我正在使用LINQ:

var jobsWithVx = Regex.Matches(src, @"(?ms)(^[ \t]*/\*[\s-]*([\w-]*)[\s-]*\*/)((?:(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*?condition\: ([^\n\r]*v\([^\n\r]*)[ \t]*\))+(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*)").Cast<Match>().Select(m => m.Value);

var jobParameters = jobsWithVx.Select(j => Regex.Matches(j, @"(?ms)^([\w_]+\:) (.+?)$")).Select(m => m.Cast<Match>().Select(am => am.Groups));

然后您可以使用作业参数:

foreach (var aJobsParms in jobParameters) {
    foreach (var jobParm in aJobsParms) {
        // work with job and parm
    }
    // alternatively, convert to a Dictionary
    var jobDict = aJobsParms.ToDictionary(jpgc => jpgc[1].Value, jpgc => jpgc[2].Value));
    // then work with the dictionary
}

在LINQPad中运行的示例:

var src = @"/* ----------------- jobnameA ----------------- */ 

insert_job: jobnameA   job_type: CMD 
date_conditions: 0
alarm_if_fail: 1


/* ----------------- jobnameB ----------------- */ 

insert_job: jobnameB   job_type: CMD 
date_conditions: 1
days_of_week: tu,we,th,fr,sa
condition: s(job1) & s(job2) & (v(variable1) = ""Y"" | s(job1)) & (v(variable2) = ""Y"" 
alarm_if_fail: 1
job_load: 1
priority: 10


/* ----------------- jobnameC ----------------- */
";

var jobsWithVx = Regex.Matches(src, @"(?ms)(^[ \t]*/\*[\s-]*([\w-]*)[\s-]*\*/)((?:(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*?condition\: ([^\n\r]*v\([^\n\r]*)[ \t]*\))+(?:(?!^[ \t]*/\*[\s-]*[\w-]*[\s-]*\*/).)*)").Cast<Match>().Select(m => m.Value);

var jobParameters = jobsWithVx.Select(j => Regex.Matches(j, @"(?ms)^([\w_]+\:) (.+?)$")).Select(m => m.Cast<Match>().Select(am => am.Groups));
jobParameters.Dump();