Question

我有一个字符串，里面有几个html注释。我需要计算表达式的唯一匹配。

例如，字符串可能是：

var teststring = "<!--X1-->Hi<!--X1-->there<!--X2-->";

我目前使用它来获取匹配项：

var regex = new Regex("<!--X.-->");
var matches = regex.Matches(teststring);

结果是3场比赛。但是，我想这只有两场比赛，因为只有两场比赛。

我知道我可以遍历生成的MatchCollection并删除额外的Match，但我希望有一个更优雅的解决方案。

澄清：示例字符串与实际使用的内容大大简化。很容易就有X8或X9，字符串中可能有几十个。

Answer 1

我只想使用Enumerable.Distinct Method这样的例子：

string subjectString = "<!--X1-->Hi<!--X1-->there<!--X2--><!--X1-->Hi<!--X1-->there<!--X2-->";
var regex = new Regex(@"<!--X\d-->");
var matches = regex.Matches(subjectString);
var uniqueMatches = matches
    .OfType<Match>()
    .Select(m => m.Value)
    .Distinct();

uniqueMatches.ToList().ForEach(Console.WriteLine);

输出：

<!--X1-->  
<!--X2-->

对于正则表达式，你可以使用这个吗？

(<!--X\d-->)(?!.*\1.*)

似乎至少在RegexBuddy中测试你的测试字符串=）

// (<!--X\d-->)(?!.*\1.*)
// 
// Options: dot matches newline
// 
// Match the regular expression below and capture its match into backreference number 1 «(<!--X\d-->)»
//    Match the characters “<!--X” literally «<!--X»
//    Match a single digit 0..9 «\d»
//    Match the characters “-->” literally «-->»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*\1.*)»
//    Match any single character «.*»
//       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Match the same text as most recently matched by capturing group number 1 «\1»
//    Match any single character «.*»
//       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

Answer 2

看来你正在做两件事：

匹配/＆lt; - X. - ＆gt; /
查找一组独特的评论

因此，将这些作为两个不同的步骤来处理是合乎逻辑的：

var regex = new Regex("<!--X.-->");
var matches = regex.Matches(teststring);

var uniqueMatches = matches.Cast<Match>().Distinct(new MatchComparer());

class MatchComparer : IEqualityComparer<Match>
{
    public bool Equals(Match a, Match b)
    {
        return a.Value == b.Value;
    }

    public int GetHashCode(Match match)
    {
        return match.Value.GetHashCode();
    }
}

Answer 3

提取注释并将其存储在数组中。然后，您可以过滤掉唯一值。

但我不知道如何在C＃中实现它。

Answer 4

以组的形式捕获评论的内部部分。然后将这些字符串放入哈希表（字典）中。然后问字典的计数，因为它会自行清除重复。

var teststring = "<!--X1-->Hi<!--X1-->there<!--X2-->";
var tokens = new Dicationary<string, string>();
Regex.Replace(teststring, @"<!--(.*)-->",   
     match => {  
     tokens[match.Groups[1].Value] = match.Groups[1].Valuel;  
     return ""; 
     });
var uniques = tokens.Keys.Count;

通过使用Regex.Replace构造，您可以在每次匹配时调用lambda。由于您对替换不感兴趣，因此不要将其设置为等于任何内容。

您必须使用组[1]，因为组[0]是整个匹配。我只是在两边重复同样的事情，所以它更容易放入字典，只存储唯一的密钥。

Answer 5

取决于您可以使用的Xn数量：

(\<!--X1--\>){1}.*(\<!--X2--\>){1}

只有当它们按顺序排列时，它才会匹配每次出现的X1，X2等。

Answer 6

如果你想要一个来自 MatchCollection 的不同匹配列表而不转换为字符串，你可以使用这样的东西：

 var distinctMatches = matchList.OfType<Match>().GroupBy(x => x.Value).Select(x =>x.First()).ToList();

我知道已经有 12 年了，但有时我们需要这种解决方案，所以我想分享一下。 C# 进化，.NET 进化，所以现在更容易了。

如何将正则表达式匹配仅添加到匹配集合中一次？

6 个答案: