如何使用LINQ对数据进行分层分组?

时间:2010-02-09 15:32:27

标签: linq c#-3.0 grouping group-by

我有一些具有各种属性的数据,我想对这些数据进行分层分组。例如:

public class Data
{
   public string A { get; set; }
   public string B { get; set; }
   public string C { get; set; }
}

我希望将其分组为:

A1
 - B1
    - C1
    - C2
    - C3
    - ...
 - B2
    - ...
A2
 - B1
    - ...
...

目前,我已经能够使用LINQ对其进行分组,使得顶层组将数据除以A,然后每个子组除以B,然后每个B子组包含C组的子组等.LINQ看起来像这样(假设一个名为IEnumerable<Data>的{​​{1}}序列:

data

正如您所看到的,这需要更多的子分组,这有点混乱。有没有更好的方法来执行这种类型的分组?好像应该有,我只是没有看到它。

更新
到目前为止,我发现通过使用流畅的LINQ API而不是查询语言来表达这种分层分组可以说明提高了可读性,但它并不觉得很干。

我有两种方法:一种使用var hierarchicalGrouping = from x in data group x by x.A into byA let subgroupB = from x in byA group x by x.B into byB let subgroupC = from x in byB group x by x.C select new { B = byB.Key, SubgroupC = subgroupC } select new { A = byA.Key, SubgroupB = subgroupB }; 使用结果选择器,另一种使用GroupBy后跟GroupBy调用。两者都可以被格式化为比使用查询语言更易读,但仍然不能很好地扩展。

Select

var withResultSelector =
    data.GroupBy(a => a.A, (aKey, aData) =>
        new
        {
            A = aKey,
            SubgroupB = aData.GroupBy(b => b.B, (bKey, bData) =>
                new
                {
                    B = bKey,
                    SubgroupC = bData.GroupBy(c => c.C, (cKey, cData) =>
                    new
                    {
                        C = cKey,
                        SubgroupD = cData.GroupBy(d => d.D)
                    })
                })
        });

我想要什么......
我可以设想几种可以表达的方式(假设语言和框架支持它)。第一个是var withSelectCall = data.GroupBy(a => a.A) .Select(aG => new { A = aG.Key, SubgroupB = aG .GroupBy(b => b.B) .Select(bG => new { B = bG.Key, SubgroupC = bG .GroupBy(c => c.C) .Select(cG => new { C = cG.Key, SubgroupD = cG.GroupBy(d => d.D) }) }) }); 扩展,它采用一系列功能对进行密钥选择和结果选择,GroupByFunc<TElement, TKey>。每对描述下一个子组。此选项会失败,因为每个对都可能需要Func<TElement, TResult>TKey与其他对不同,这意味着TResult需要有限的参数和复杂的声明。

第二个选项是GroupBy扩展方法,可以链接以生成子组。 SubGroupBySubGroupBy相同,但结果将是之前进一步分区的分组。例如:

GroupBy

var groupings = data
    .GroupBy(x=>x.A)
    .SubGroupBy(y=>y.B)
    .SubGroupBy(z=>z.C)

这方面的困难在于如何有效地实现方法,就像我目前的理解一样,每个级别都会重新创建新对象以扩展先前的对象。第一次迭代将创建A的分组,第二次迭代将创建具有A的键和B的分组的对象,第三次将重做所有这些并添加C的分组。这看起来非常低效(尽管我怀疑我当前的选项实际上这样做无论如何)。如果调用传递了所需内容的元描述并且实例仅在最后一次传递时创建,那将会很好,但这听起来也很困难。请注意,他类似于// This version has a custom result type that would be the grouping data. // The element data at each stage would be the custom data at this point // as the original data would be lost when projected to the results type. var groupingsWithCustomResultType = data .GroupBy(a=>a.A, x=>new { ... }) .SubGroupBy(b=>b.B, y=>new { ... }) .SubGroupBy(c=>c.C, c=>new { ... }) 可以执行的操作,但没有嵌套方法调用。

希望所有这一切都有意义。我希望我在这里追逐彩虹,但也许不是。

更新 - 另一个选项
我认为比我以前的建议更优雅的另一种可能性依赖于每个父组只是一个键和一系列子项(如示例中所示),就像GroupBy现在提供的那样。这意味着构建此分组的一个选项是一系列键选择器和一个结果选择器。

如果密钥都限于集合类型,这不是不合理的,那么这可以生成为一系列密钥选择器和结果选择器,或结果选择器和IGrouping密钥选择器。当然,如果键必须是不同类型和不同级别的,那么除了由于泛型参数化的工作方式有限的层次深度之外,这再次变得困难。

以下是我的意思的一些说明性例子:

例如:

params

或者:

public static /*<grouping type>*/ SubgroupBy(
    IEnumerable<Func<TElement, TKey>> keySelectors,
    this IEnumerable<TElement> sequence,
    Func<TElement, TResult> resultSelector)
{
    ...
}

var hierarchy = data.SubgroupBy(
                    new [] {
                        x => x.A,
                        y => y.B,
                        z => z.C },
                    a => new { /*custom projection here for leaf items*/ })

这并不能解决实现效率低下的问题,但它应该解决复杂的嵌套问题。但是,这种分组的返回类型是什么?我需要自己的界面吗?或者我可以以某种方式使用public static /*<grouping type>*/ SubgroupBy( this IEnumerable<TElement> sequence, Func<TElement, TResult> resultSelector, params Func<TElement, TKey>[] keySelectors) { ... } var hierarchy = data.SubgroupBy( a => new { /*custom projection here for leaf items*/ }, x => x.A, y => y.B, z => z.C) 。我需要定义多少,或者层次结构的可变深度是否仍然无法实现?

我的猜测是,这应该与任何IGrouping调用的返回类型相同,但如果类型系统没有涉及传递的任何参数,那么类型系统如何推断该类型?

这个问题延伸了我的理解,这很好,但我的大脑很痛。

3 个答案:

答案 0 :(得分:9)

Here is a description如何实现分层分组机制。

从此描述:

结果类:

public class GroupResult
{
    public object Key { get; set; }
    public int Count { get; set; }
    public IEnumerable Items { get; set; }
    public IEnumerable<GroupResult> SubGroups { get; set; }
    public override string ToString() 
    { return string.Format("{0} ({1})", Key, Count); }
}

扩展方法:

public static class MyEnumerableExtensions
{
    public static IEnumerable<GroupResult> GroupByMany<TElement>(
        this IEnumerable<TElement> elements,
        params Func<TElement, object>[] groupSelectors)
    {
        if (groupSelectors.Length > 0)
        {
            var selector = groupSelectors.First();

            //reduce the list recursively until zero
            var nextSelectors = groupSelectors.Skip(1).ToArray();
            return
                elements.GroupBy(selector).Select(
                    g => new GroupResult
                    {
                        Key = g.Key,
                        Count = g.Count(),
                        Items = g,
                        SubGroups = g.GroupByMany(nextSelectors)
                    });
        }
        else
            return null;
    }
}

<强>用法:

var result = customers.GroupByMany(c => c.Country, c => c.City);

修改

这是一个改进且版式合适的代码版本。

public class GroupResult<TItem>
{
    public object Key { get; set; }
    public int Count { get; set; }
    public IEnumerable<TItem> Items { get; set; }
    public IEnumerable<GroupResult<TItem>> SubGroups { get; set; }
    public override string ToString() 
    { return string.Format("{0} ({1})", Key, Count); }
}

public static class MyEnumerableExtensions
{
    public static IEnumerable<GroupResult<TElement>> GroupByMany<TElement>(
        this IEnumerable<TElement> elements,
        params Func<TElement, object>[] groupSelectors)
    {
        if (groupSelectors.Length > 0)
        {
            var selector = groupSelectors.First();

            //reduce the list recursively until zero
            var nextSelectors = groupSelectors.Skip(1).ToArray();
            return
                elements.GroupBy(selector).Select(
                    g => new GroupResult<TElement> {
                        Key = g.Key,
                        Count = g.Count(),
                        Items = g,
                        SubGroups = g.GroupByMany(nextSelectors)
                    });
        } else {
            return null;
        }
    }
}

答案 1 :(得分:4)

你需要一个递归函数。递归函数为树中的每个节点调用自身。

要在Linq中执行此操作,您可以use a Y-combinator

答案 2 :(得分:0)

这是我尝试创建嵌套分组。可能有人觉得它有用。

// extension method
public static IEnumerable<TResult> GroupMany<TElement, TResult>(this IEnumerable<TElement> seq, Func<GroupingBuilder<TElement>, IGroupingStage<TElement, TResult>> configure)
{
    var builder = new GroupingBuilder<TElement>();
    return configure(builder).ApplyTo(seq);
}

// builder classes

public class GroupingBuilder<TElement>
{
    public GroupingBuilder<TKeyNext, Group<TKeyNext, TElement>, TElement, TElement> By<TKeyNext>(Func<TElement, TKeyNext> keySelector)
        => By(keySelector, (k, s, nested) => Group.Of(k, nested(s)));

    public new GroupingBuilder<TKeyNext, TElementNext, TElement, TElement> By<TKeyNext, TElementNext>(
        Func<TElement, TKeyNext> keySelector,
        Func<TKeyNext, IEnumerable<TElement>, Func<IEnumerable<TElement>, IEnumerable<TElement>>, TElementNext> elementSelector)
        => new GroupingBuilder<TKeyNext, TElementNext, TElement, TElement>(keySelector, elementSelector, new IdentityStage());


    // preventing writing GroupMany(g => g), i.e. mentioned call will not compile
    private class IdentityStage : IGroupingStage<TElement, TElement>
    {
        public IEnumerable<TElement> ApplyTo(IEnumerable<TElement> seq) => seq;
    }
}

public class GroupingBuilder<TKeyCurrent, TElementCurrent, TElementPrev, TElement> : IGroupingStage<TElement, TElementCurrent>
{
    private Func<TElement, TKeyCurrent> _keySelector;
    private IGroupingStage<TElement, TElementPrev> _prevStage;
    private Func<TKeyCurrent, IEnumerable<TElement>, Func<IEnumerable<TElement>, IEnumerable<TElementPrev>>, TElementCurrent> _elementSelector;

    public GroupingBuilder(
        Func<TElement, TKeyCurrent> keySelector,
        Func<TKeyCurrent, IEnumerable<TElement>, Func<IEnumerable<TElement>, IEnumerable<TElementPrev>>, TElementCurrent> elementSelector,
        IGroupingStage<TElement, TElementPrev> prevStage)
    {
        _keySelector = keySelector;
        _prevStage = prevStage;
        _elementSelector = elementSelector;
    }

    public GroupingBuilder<TKeyNext, Group<TKeyNext, TElementCurrent>, TElementCurrent, TElement> By<TKeyNext>(
        Func<TElement, TKeyNext> keySelector)
        => By(keySelector, (k, s, nested) => Group.Of(k, nested(s)));

    public GroupingBuilder<TKeyNext, TElementNext, TElementCurrent, TElement> By<TKeyNext, TElementNext>(
        Func<TElement, TKeyNext> keySelector,
        Func<TKeyNext, IEnumerable<TElement>, Func<IEnumerable<TElement>, IEnumerable<TElementCurrent>>, TElementNext> elementSelector)
        => new GroupingBuilder<TKeyNext, TElementNext, TElementCurrent, TElement>(keySelector, elementSelector, this);

    IEnumerable<TElementCurrent> IGroupingStage<TElement, TElementCurrent>.ApplyTo(IEnumerable<TElement> seq)
        => seq.GroupBy(_keySelector, (k, s) => _elementSelector(k, s, _prevStage.ApplyTo));
}

public interface IGroupingStage<TElement, TResultElement>
{
    IEnumerable<TResultElement> ApplyTo(IEnumerable<TElement> seq);
}

// Group data structure
public class Group<TKey, TElement>
{
    public TKey Key { get; set; }
    public ICollection<TElement> Items { get; set; }
}

public static class Group
{
    public static Group<TKey, TElement> Of<TKey, TElement>(TKey key, IEnumerable<TElement> elements)
        => new Group<TKey, TElement> { Key = key, Items = elements.ToList() };
}

基本用法:

var items = new[]{
    new SomeEntity{NonUniqueId = 1, Name = "John", Surname = "Doe", DoB = new DateTime(1900, 01, 03)},
    new SomeEntity{NonUniqueId = 1, Name = "John", Surname = "Doe", DoB = new DateTime(1980, 01, 03)},
    new SomeEntity{NonUniqueId = 2, Name = "Jane", Surname = "Doe", DoB = new DateTime(1902, 01, 03)},
    new SomeEntity{NonUniqueId = 1, Name = "Jane", Surname = "Smith", DoB = new DateTime(1999, 01, 03)},
};

IEnumerable<Group<int, Group<DateTime, Group<string, SomeEntity>>>> result = items
    .GroupMany(c => c
        .By(x => x.Surname)
        .By(x => x.DoB)
        .By(x => x.NonUniqueId));

请注意,必须以相反的顺序指定分组属性。这是由泛型限制引起的 - GroupingBuilder<TKeyCurrent, TElementCurrent, TElementPrev, TElement> 用新的分组类型包装了以前的分组类型,因此只能以相反的顺序进行嵌套。

与自定义结果选择器一起使用:

var result = items
    .GroupMany(c => c
        .By(x => x.Surname, (key, seq, nested) => new { Surname = key, ChildItems = nested(seq).ToList() })
        .By(x => x.DoB, (key, seq, nested) => new { DoB = key, Children = nested(seq).ToList() })
        .By(x => x.NonUniqueId));