Question

我遇到了以下问题。

例如，我有一些项目集合

    List<int> exampleList = new List<int> { 1, 3, 5, 6, 7, 8, 6, 5, 6, 6 };

以及第一个

子组的其他一些项目集合

    List<int> customSelection = new List<int> { 1, 5, 6, 6, 8 };

我想要的是获得它们之间的差异，例如：获取包含项{ 3, 7, 5, 6, 6 }的集合，或者换句话说，IEnumerable<int> resultingCollection使customSelection.Concat(resultingCollection)等同于exampleList（不查看项目顺序）。

我无法使用.Except()扩展方法，因为它会排除第一个收集中存在于第二个收集中的所有项目，而这不是我正在寻找的内容。我带来的唯一解决方案是执行以下操作

        // count item occurances in first collection
        var countedItemsInFisrt = exampleList.GroupBy(item => item)
            .ToDictionary(group => group.Key, group => group.Count());
        // count item occurances in second collection
        var countedItemsInSecond = customSelection.GroupBy(item => item)
            .ToDictionary(group => group.Key, group => group.Count());

        List<int> resultingCollection = new List<int>();

        int itemsCountDifference;
        int itemsCountInSecond;
        foreach (var kvp in countedItemsInFisrt)
        {
            // when item count in first collection is grater then in second one we add it to resulting collection
            // "count difference" times
            if (!countedItemsInSecond.TryGetValue(kvp.Key, out itemsCountInSecond))
                itemsCountInSecond = 0;
            itemsCountDifference = kvp.Value - itemsCountInSecond;
            for (int i = 0; i < itemsCountDifference; i++)
                resultingCollection.Add(kvp.Key);
        }

        var stringResult = resultingCollection.Select(items => items.ToString());
        Console.WriteLine(stringResult.Aggregate((a, b) => a + "," + b));

这只是执行选择的大量代码。更令我担心的是性能，因为在实际情况下，两个系列都可以有很多项目。

这可以用更好的方式完成吗？也许我错过了一些可以帮助我的LINQ的事情？

修改

目前最好的解决方案是Ulugbek Umirov建议的最后一种算法。它保留了原始集合中的顺序，并且当我们选择原始集合的1/2时，它也比任何其他算法快得多，并且当选择较少时甚至更快。非常感谢Ulugbek Umirov！我已经将它变成了适用于任何通用集合的通用扩展方法：

    public static IEnumerable<T> Subtract<T>(this IEnumerable<T> minuend, IEnumerable<T> subtrahend)
    {
        var diffList = new List<T>(minuend.Count() - subtrahend.Count());
        var diffDict = subtrahend.GroupBy(n => n)
                                 .ToDictionary(g => g.Key, g => g.Count());
        minuend.ForeEach(n =>
        {
            int count = 0;
            if (diffDict.TryGetValue(n, out count))
            {
                if (count == 1)
                    diffDict.Remove(n);
                else
                    diffDict[n] = count - 1;
            }
            else
                diffList.Add(n);
        });

        return diffList;
    }

Answer 1

我不会将第二个列表分组。

List<int> exampleList = new List<int> { 1, 3, 5, 6, 7, 8, 6, 5, 6, 6 };
List<int> customSelection = new List<int> { 1, 5, 6, 6, 8 };

var diffDic = exampleList.GroupBy(n => n)
                         .ToDictionary(g => g.Key, g => g.Count());
customSelection.ForEach(n =>
{
    if (diffDic.ContainsKey(n))
        diffDic[n]--;
});
var diffList = diffDic.Where(p => p.Value > 0)
                      .SelectMany(p => Enumerable.Repeat(p.Key, p.Value))
                      .ToList();

以下代码可以改善性能：

customSelection.ForEach(n =>
{
    int count = 0;
    if (diffDic.TryGetValue(n, out count))
    {
        if (count == 1)
            diffDic.Remove(n);
        else
            diffDic[n] = count - 1;
    }
});

<强>更新

如果要保留项目的原始顺序，可以使用以下代码：

List<int> exampleList = new List<int> { 1, 3, 5, 6, 7, 8, 6, 5, 6, 6 };
List<int> customSelection = new List<int> { 1, 5, 6, 6, 8 };

var diffList = new List<int>(exampleList.Count);
var customSelectionDic = customSelection.GroupBy(n => n)
                                        .ToDictionary(g => g.Key, g => g.Count());
exampleList.ForEach(n =>
    {
        int count = 0;
        if (customSelectionDic.TryGetValue(n, out count))
        {
            if (count == 1)
                customSelectionDic.Remove(n);
            else
                customSelectionDic[n] = count - 1;
        }
        else
            diffList.Add(n);
    });

// diffList: { 3, 7, 5, 6, 6 }

Answer 2

这不会是最快的，会更改原始列表，但我认为这是最短的方式：

customSelection.ForEach(x => exampleList.Remove(x));

现在 exampleList 将包含3,7,5,6,6

Answer 3

简单的解决方案就是从第二个副本中一次删除第一个列表中的项目：

var exampleList = new List<int> { 1, 3, 5, 6, 7, 8, 6, 5, 6, 6 };
var customSelection = new List<int> {1, 5, 6, 6, 8};

var result = new List<int>(exampleList);

foreach (var item in customSelection)
{
    result.Remove(item);
}

但是，由于每次从列表中删除项目时必须进行内部调整，并且您提到OP中存在问题，因此这不是非常高效。首先，测试它，如果性能不够好，那么我会使用List.RemoveAll。它需要一个谓词，这意味着它可以包含局部变量：

    public static void Main()
    {
        var exampleList = new List<int> { 1, 3, 5, 6, 7, 8, 6, 5, 6, 6 };
        var customSelection = new List<int> {1, 5, 6, 6, 8};

        var counts = customSelection.GroupBy(x => x)
                     .ToDictionary(i => i.Key, i => i.Count());
        var removedCounts = new Dictionary<int, int>();

        var result = new List<int>(exampleList);

        result.RemoveAll(x => RemovalCheck(counts, removedCounts, x));
    }

    private static bool RemovalCheck(Dictionary<int, int> counts, Dictionary<int, int> removed, int item)
    {
        if (!counts.ContainsKey(item))
            return false;
        if (!removed.ContainsKey(item))
            removed[item] = 0;
        if (removed[item] >= counts[item])
            return false;
        removed[item]++;
        return true;
    }

（你可以用lambda完成所有这些，而不是定义一个单独的方法，但我没有看到任何理由）

这两个都会返回所需的结果。

获取集合及其子组之间的差异项

3 个答案: