“紧”重复键/值匹配

时间:2018-04-12 14:59:27

标签: c# regex

具有“紧密”重复键/值对模式的字符串(对于此示例,键是“name”,值应该是单个小写字词)

string text = "name: abc name: def name: ghi name: jkl";

应该转换为输出

  

abc,def,ghi,jkl,

而在

中的模式中有任何干扰(“非紧”)
string text = "name: abc x name: def name: ghi name: jkl";

会导致匹配失败,这与

相似
  

abc,##发生异常:x无法与模式##

匹配

我试过了

string text = "name: abc name: def name: ghi name: jkl";
string pattern = @"name:\s*([a-z])*\s*";

MatchCollection ms = Regex.Matches(text, pattern);

foreach (Match m in ms)
{
    Console.Write(m.Groups[1].Value+", ");
}

但它返回

  

c,f,i,l,

导致这种奇怪行为的原因是什么?如何解决?

4 个答案:

答案 0 :(得分:4)

您只需要在括号内移动*以捕获完整的字符串。如果要防止无效输入,则不一定需要正则表达式。这假设您的值不能有空格,因为这将是一个难以解决的问题。

void Main()
{
    string validText = "name: abc name: def name: ghi name: jkl";
    string invalidText = "name: abc x name: def name: ghi name: jkl";
    string validPattern = @"name:\s*([a-z]*)\s*";

    if (!Validate(invalidText))
    {
        try
        {
            throw new Exception("invalid input");
        }
        catch (Exception exception)
        {
            Console.WriteLine($"Input '{invalidText}' produces: {exception.Message}");
        }
    }

    MatchCollection ms = Regex.Matches(validText, validPattern);

    Console.Write($"Input '{validText}' produces: ");
    foreach (Match m in ms)
    {
        Console.Write(m.Groups[1].Value + ", ");
    }
}

public static bool Validate(string input)
{
    var pairs = input.Split(' ');
    return !pairs.Where((item, index) => index % 2 != 0).Any(item => item.EndsWith(":"));
}

// Input 'name: abc x name: def name: ghi name: jkl' produces: invalid input
// Input 'name: abc name: def name: ghi name: jkl' produces: abc, def, ghi, jkl, 

https://regex101.com/r/qsQNr1/1

答案 1 :(得分:1)

你不能只使用

var result = "name: abc name: def name: ghi name: jkl".Split(new [] { "name: " }, StringSplitOptions.None).Where(a=>!String.IsNullOrEmpty(a)).ToArray();

答案 2 :(得分:1)

与大多数其他正则表达式不同,C#(。Net)的引擎实际上通过Group类的Captures属性跟踪重复捕获。

  

Group.Captures Property

     

获取捕获组匹配的所有捕获的集合,以最左上第一顺序(如果正则表达式使用RegexOptions.RightToLeft选项修改,则为最内 - 最右 - 第一顺序)。

这意味着通过访问Groups[1](如下面的代码所示)然后访问Captures属性,我们有效地获取每个重复捕获的值

代码

See code in use here

using System;
using System.Linq;
using System.Text.RegularExpressions;

class Example {

    static void Main() {
        string[] strings = new string[]{
            "name: abc name: def name: ghi name: jkl",
            "name: abc x name: def name: ghi name: jkl"
        };
        Regex regex = new Regex(@"^(?:name: *([a-z]+) *)+$");
        foreach(string s in strings) {
            if(regex.IsMatch(s)) {
                Match match = regex.Match(s);
                Console.WriteLine(string.Join(", ", from Capture c in match.Groups[1].Captures select c.Value));
            } else {
                Console.WriteLine("Invalid input");
            }
        }
    }
}

结果

name: abc name: def name: ghi name: jkl     # abc, def, ghi, jkl
name: abc x name: def name: ghi name: jkl   # Invalid input

答案 3 :(得分:0)

我终于找到了获得“错误位置”的解决方案,即。重复匹配首先失败的位置。我希望任何遇到同样问题的人都会发现这个错误的答案:

string s = "name: abc x name: def name: ghi name: jkl";
string pattern = @"\Gname:\s*([a-z]+)\s*";

int endOfLastMatch = 0;

// by using the \G anchor we can manage pattern repetition in a for-loop
for (Match m = Regex.Match(s, pattern); m.Success; m = m.NextMatch())
{
    Console.WriteLine(m.Groups[1].Value+ ", ");

    // keep track of until where matches were successful
    endOfLastMatch = m.Index + m.Length;
}

// in case that the match has failed, report where it has happened
if (endOfLastMatch != s.Length) 
{
    int reportSize = Math.Min(s.Length-endOfLastMatch, 10);
    string remainder = s.Substring(endOfLastMatch, reportSize);
    Console.WriteLine("Error: RegEx match failed at index "
        +endOfLastMatch+" (\""+remainder+"...\")");
}

输出:

  

ABC

     

错误:RegEx匹配在索引10处失败(“x name:de ...”)

神奇发生在\G锚点,它允许我们使用NextMatch进行模式重复,同时强制下一个匹配紧跟最后一个匹配。匹配边界相当于^(行或字符串开始)锚点,所以说。

价格是我们不会在一个块中解析文本(如在接受的答案中),而是在for循环中。但由于它相对较小,我认为这是可以接受的。