Question

鉴于此系列：

var list = new [] {
    "1.one",
    "2. two",
    "no number",
    "2.duplicate",
    "300. three hundred",
    "4-ignore this"};

如何获得带有不同数字的数字后跟点（正则表达式@"^\d+(?=\.)" ）的项目子集？那就是：

{"1.one", "2. two", "300. three hundred"}

更新

我对此的尝试是使用IEqualityComparer传递给Distinct方法。我借用了这个GenericCompare类并尝试了以下代码无效：

var pattern = @"^\d+(?=\.)";
var comparer = new GenericCompare<string>(s => Regex.Match(s, pattern).Value);
list.Where(f => Regex.IsMatch(f, pattern)).Distinct(comparer);

Answer 1

如果您喜欢Linq的方法，可以尝试将一个命名的捕获组添加到正则表达式，然后过滤与正则表达式匹配的项目，按捕获的数字分组，最后只获取每个数字的第一个字符串。我喜欢解决方案的可读性，但如果有更有效的方法来消除重复，我不会感到惊讶，让我们看看其他人是否采用了不同的方法。

这样的事情：

list.Where(s => regex.IsMatch(s))
    .GroupBy(s => regex.Match(s).Groups["num"].Value)
    .Select(g => g.First())

您可以尝试使用此示例：

public class Program
{
    private static readonly Regex regex = new Regex(@"^(?<num>\d+)\.", RegexOptions.Compiled);

    public static void Main()
    {
        var list = new [] {
            "1.one",
            "2. two",
            "no number",
            "2.duplicate",
            "300. three hundred",
            "4-ignore this"
        };

        var distinctWithNumbers = list.Where(s => regex.IsMatch(s))
                                      .GroupBy(s => regex.Match(s).Groups["num"].Value)
                                      .Select(g => g.First());

        distinctWithNumbers.ToList().ForEach(Console.WriteLine);
        Console.ReadKey();
    }       
}

您可以在this fiddle

中尝试使用该方法

正如@orad在评论中指出的那样，MoreLinq中有一个Linq扩展DistinctBy()，可用于代替分组，然后获取组中的第一个项目以消除重复：

var distinctWithNumbers = list.Where(s => regex.IsMatch(s))
                              .DistinctBy(s => regex.Match(s).Groups["num"].Value);

在this fiddle

中试用

修改

如果你想使用你的比较器，你需要实现GetHashCode所以它也使用表达式：

public int GetHashCode(T obj) { return _expr.Invoke(obj).GetHashCode(); }

然后你可以使用带有字符串的lambda函数的comparer并使用正则表达式获取数字：

var comparer = new GenericCompare<string>(s => regex.Match(s).Groups["num"].Value); var distinctWithNumbers = list.Where(s => regex.IsMatch(s)).Distinct(comparer);

我用这种方法创建了另一个fiddle。

使用前瞻性正则表达式

您可以在正则表达式@"^\d+(?=\.)"中使用这两种方法中的任何一种。

只需更改获得“num”组s => regex.Match(s).Groups["num"].Value的lambda表达式，其表达式可以获得正则表达式匹配s => regex.Match(s).Value

更新了fiddle here。

Answer 2

（我也可以将其标记为答案）

此解决方案无需重复正则表达式运行即可运行：

var regex = new Regex(@"^\d+(?=\.)", RegexOptions.Compiled);
list.Select(i => {
    var m = regex.Match(i);
    return new KeyValuePair<int, string>( m.Success ? Int32.Parse(m.Value) : -1, i );
})
.Where(i => i.Key > -1)
.GroupBy(i => i.Key)
.Select(g => g.First().Value);

在this fiddle中运行。

Answer 3

您的解决方案足够好。

您还可以使用LINQ查询语法，借助let关键字来避免正则表达式重新运行，如下所示：

var result =
        from kvp in
        (
            from s in source
            let m = regex.Match(s)
            where m.Success
            select new KeyValuePair<int, string>(int.Parse(m.Value), s)
        )
        group kvp by kvp.Key into gr
        select new string(gr.First().Value);

Answer 4

这样的事情应该有效：

List<string> c = new List<string>()
{
    "1.one",
    "2. two",
    "no number",
    "2.duplicate",
    "300. three hundred",
    "4-ignore this"
};

c.Where(i =>
{
    var match = Regex.Match(i, @"^\d+(?=\.)");
    return match.Success;
});

linq中的部分字符串区分

4 个答案: