Question

假设我有List<List<Integer>>，其中包含从1到n的数字列表。删除具有相同成员但在不同索引中的列表的好方法是什么？

如果我有[[1,2,3], [2,1,3], [4,5,6]]，我会认为第一个和第二个成员是重复的，我想删除其中一个（无关紧要）获取[[2,1,3], [4,5,6]]或{{ 1}}。

通过循环遍历所有成员并使用[[1,2,3], [4,5,6]]甚至使用O(n^2)，有一个list.contains(x)解决方案，但我想知道是否有更好的解决方案来执行此操作。

Answer 1

这样做的一种方法是散列每个列表，然后使用相同的散列更仔细地检查列表。有很多方法可以做到这一点：

如果您从列表元素的xor构建哈希，那么哈希很弱，但构建起来很便宜，因为它独立于列表元素的顺序。如果每个列表中有 n 列表和 k 项，那么构建哈希只是Θ（n k），这非常便宜。当然，需要比较具有相同散列的列表，并且此方法的弱散列可能会导致更多的冲突。
如果您对每个列表进行排序，然后从排序结果中构建哈希，哈希将更强，但构建哈希将采用Θ（nk log（k））。< / p>

效果更好的方法取决于设置。

Answer 2

算法简而言之：

将外部列表的每个元素投影为哈希和索引的元组。
按元组的第一个元素（哈希）排序元组列表
使用原始哈希值从元组中提取索引

以下代码实现了此算法

 using System;
 using System.Collections.Generic;
 using System.Diagnostics;
 using System.Linq;

 static class Program
{
//  Computes hash of array (we suppose, that any array has the fixed length)
//  In other words, we suppose, that all input arrays have the same length
static int array_hash(int[] array)
{
    int hc = array.Length;
    for (int i = 0; i < array.Length; ++i)
    {
        hc = unchecked(hc * 314159 + array[i]);
    }
    return hc;
}
static void Main(string[] args)
{
    var lists = new List<List<int>>();
    lists.Add(new List<int>() { 1, 2, 3 });
    lists.Add(new List<int>() { 3, 2, 1 });
    lists.Add(new List<int>() { 4, 5, 6 });

    var hashs = new List<Tuple<int, int>>(lists.Count);

    for (int i= 0; i < lists.Count; ++i)
    {
        var inner_list_copy = lists[i].ToArray();
        Array.Sort(inner_list_copy);
        hashs.Add(Tuple.Create(array_hash(inner_list_copy), i));
    }
    hashs.Sort((tuple1, tuple2) => tuple1.Item1.CompareTo(tuple2.Item1));
    var indices = new List<int>();
    var last_hash = 0;
    if (hashs.Count != 0)
    {
        last_hash = hashs[0].Item1;
        indices.Add(hashs[0].Item2);
    }
    for (int i = 1; i < hashs.Count; ++i)
    {
        var new_hash = hashs[i].Item1;
        if (new_hash != last_hash)
        {
            last_hash = new_hash;
            indices.Add(hashs[i].Item2);
        }
    }
    Console.WriteLine("Indices");
    for (int i = 0; i <  indices.Count; ++i)
    {
        Console.WriteLine(indices[i]);
    }

    Console.ReadLine();
}
}

注意：您可以探索其他哈希函数的用法。见C# hashcode for array of ints

P.S。只是为了好玩 - 在haskell的解决方案

-- f - removes duplicates from list of lists via sorting and grouping
f = (map head) . group . (map sort)

删除不同索引中具有相同成员的列表

2 个答案: