搜索词的正则表达式加上键值对

时间:2017-11-28 14:53:01

标签: c# .net regex

找到了很多kvp正则表达式,但我正在寻找能同时执行搜索词和kvp的词。例如,如果搜索字符串是:

John Hennesey POLICY_NUMBER="POL-1-2345-6-780" EXPIRATION_DATE="2017-01-01T00:00:00" business_name="Hennesey Hen Houses" PREMIUM="between 100 and 400"

使用正则表达式 (?<term>.?)(((?<key>\w+)(?<operator>(=|:|))"\s*(?<value>.*?)\s*")){0,}

我希望正则表达式返回:

术语是&#34; John Hennesey&#34;

一组密钥值对 - policy_number,其值为&#34; POL-1-2345-6-780&#34;,expiration_date为&#34; 2017-01-01T00:00:00&#34;,business_name是&#34; Hennesey Hen Houses&#34;和溢价是&#34;在100到400之间&#34;。

如果需要,我可以将其拆分为两个正则表达式匹配/搜索,不必为1.

更新: @ctwheels愚蠢的提问时间 - 你给出的第一个例子有一个匹配,但是我无法进入c#中的关键匹配。它(从结果中)也出现了&#34;术语&#34;是必须的。我做错了什么?

string ctWheels1 = @"(?<term>^.*?(?=\s*\w+[=:]?""))|(((?<key>\w +)(?<operator>([=:]?))""\s*(?<value>[^""]*)\s*""))";

string input = "john hennesey BUSINESS_NAME=\"Hennesey Hen Houses*\" policy_year:\"2017\" MINIMUM_PREMIUM_flag:\"y\"";
Regex c1 = new Regex(ctWheels1);
bool ismatch = c1.IsMatch(input);      // returns true
var x = c1.Matches(input);
int dummy = x.Count;                    // returns 1

@ctwheels - 愚蠢的我,我做错了。

        string pattern = @"(?<term>^.*?(?=\s*\w+[=:]?""))|(((?<key>\w+)(?<operator>([=:]?))""\s*(?<value>[^""]*)\s*""))";
        string input = @"John Hennesey POLICY_NUMBER=""POL-1-2345-6-780"" EXPIRATION_DATE=""2017-01-01T00:00:00"" business_name=""Hennesey Hen Houses"" PREMIUM=""between 100 and 400""";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }

1 个答案:

答案 0 :(得分:0)

您可以使用匹配并将术语捕获到组“术语”中的正则表达式和所有其他键值分为2组,“键”和“值”,并且您可以使用组<检索所有捕获的子串EM> CaptureCollection :

var rx = @"(?<term>\w+(?:\s+\w+)*)(?:\s+(?<key>\w+)=""(?<value>[^""]*)"")+";
var s = "John Hennesey POLICY_NUMBER=\"POL-1-2345-6-780\" EXPIRATION_DATE=\"2017-01-01T00:00:00\" business_name=\"Hennesey Hen Houses\" PREMIUM=\"between 100 and 400\"  Mike Ramsey POLICY_NUMBER=\"POL-2-2346-8-080\" EXPIRATION_DATE=\"2017-02-08T01:04:50\" business_name=\"Mike Ramsey Igloos\" PREMIUM=\"between 200 and 500\"";
var ms = Regex.Matches(s, rx);
foreach (Match m in ms)
{
    var term = m.Groups["term"].Value;
    Dictionary<string, string> dct = m.Groups["key"].Captures // Get Group "term" capture collection
            .Cast<Capture>()
            .Select(x=>x.Value)           // Convert to a list of values
            .ToList()
            .Zip(                         // Zip with the Group "value" substrings to get a dictionary
                m.Groups["value"].Captures.Cast<Capture>().Select(x=>x.Value).ToList(), 
                (k, v) => new { k, v }
            )
            .ToDictionary(x => x.k, x => x.v);
    Console.WriteLine("---- NEXT MATCH ----\nTerm: {0}", term); // Demo output
    foreach (var kvp in dct) 
    {
        Console.WriteLine("KVP: {0}:{1}", kvp.Key, kvp.Value);
    }
}

C# demo的输出:

---- NEXT MATCH ----
Term: John Hennesey
KVP: POLICY_NUMBER:POL-1-2345-6-780
KVP: EXPIRATION_DATE:2017-01-01T00:00:00
KVP: business_name:Hennesey Hen Houses
KVP: PREMIUM:between 100 and 400
---- NEXT MATCH ----
Term: Mike Ramsey
KVP: POLICY_NUMBER:POL-2-2346-8-080
KVP: EXPIRATION_DATE:2017-02-08T01:04:50
KVP: business_name:Mike Ramsey Igloos
KVP: PREMIUM:between 200 and 500

正则表达式是

(?<term>\w+(?:\s+\w+)*)(?:\s+(?<key>\w+)="(?<value>[^"]*)")+

请参阅regex demo

<强>详情

  • (?<term>\w+(?:\s+\w+)*)
  • (?:\s+(?<key>\w+)="(?<value>[^"]*)")+ - 连续发生1次或更多次
    • \s+ - 1+空白字符
    • (?<key>\w+) - 群组密钥:1个或多个字词
    • =" - 文字子字符串
    • (?<value>[^"]*) - 组:除"以外的0 +字符
    • " - "

如果值可以只是非空白子串(没有双引号),则可以使用

进一步增强模式
(?<term>\w+(?:\s+\w+)*)(?:\s+(?<key>\w+)=(?:"(?<value>[^"]*)"|(?<value>\S+)))+

请参阅this regex demo

如果该字词中可能包含=字符,则可以将 term 组模式替换为(?<term>[^=]*)(请参阅this regex demo)。如果键可以包含比字符更多的字符,但不能包含空格,请将(?<key>\w+)替换为(?<key>[^\s=]+)(请参阅this regex demo)。