Question

几天前，我回答an interesting question关于HashSet<T>的问题。一个可能的解决方案涉及克隆hashset，在我的回答中我建议做这样的事情：

HashSet<int> original = ...
HashSet<int> clone = new HashSet<int>(original);

虽然这种方法非常简单，但我怀疑它效率很低：新HashSet<T>的构造函数需要单独添加原始hashset中的每个项目，检查它是否已经存在< / strong>即可。这显然是浪费时间：因为源集合是ISet<T>，所以保证不包含重复项。应该有办法利用这些知识......

理想情况下，HashSet<T>应该实现ICloneable，但不幸的是情况并非如此。我还检查了Reflector，看看HashSet<T>构造函数是否做了特定的事情，如果源集合是一个哈希集，但事实并非如此。它可能可以通过在私有字段上使用反射来完成，但这将是一个丑陋的黑客......

那么，有人提出了一个更有效克隆哈希集的聪明解决方案吗？

（请注意，这个问题纯粹是理论上的，我不需要在真正的程序中这样做）

Answer 1

如果您真的想要克隆HashSet<T>的最有效方法，那么您可以执行以下操作（但可能会以可维护性为代价）

使用反射器或调试器确切地确定HashSet<T>中需要复制的字段。您可能需要为每个字段递归执行此操作。
使用Reflection.Emit或使用表达式树生成一个方法，该方法可以对所有字段进行必要的复制。可能需要调用其他生成的方法来复制每个字段的值。我们正在使用运行时代码生成，因为它是直接访问私有字段的唯一方法。
使用FormatterServices.GetUninitializedObject(...)实例化空白对象。使用步骤2中生成的方法将原始对象复制到新的空白对象。

Answer 2

编辑：经过仔细检查后，这似乎不是一个好主意，原始哈希集中少于60个元素，下面的方法似乎比创建新的哈希集要慢。

免责声明：这似乎有效，但如果您要序列化克隆的哈希集，则可能需要自担风险，您可能要复制SerializationInfo m_siInfo。

我也遇到了这个问题并对其进行了尝试，下面你将找到一个使用FieldInfo.GetValue和SetValue复制所需字段的扩展方法。它比使用HashSet（IEnumerable）更快，多少取决于原始hashset中的元素数量。对于1,000个元素，差异大约为7倍。对于100,000个元素，其大约为3倍。

还有其他方法可能更快，但这已经摆脱了我现在的瓶颈。我尝试使用表达式和发射，但遇到了障碍，如果我让那些工作我会更新这篇文章。

using System;
using System.Collections.Generic;
using System.Reflection;
using System.Runtime.Serialization;

public static class HashSetExtensions
{
    public static HashSet<T> Clone<T>(this HashSet<T> original)
    {
        var clone = (HashSet<T>)FormatterServices.GetUninitializedObject(typeof(HashSet<T>));
        Copy(Fields<T>.comparer, original, clone);

        if (original.Count == 0)
        {
            Fields<T>.freeList.SetValue(clone, -1);
        }
        else
        {
            Fields<T>.count.SetValue(clone, original.Count);
            Clone(Fields<T>.buckets, original, clone);
            Clone(Fields<T>.slots, original, clone);
            Copy(Fields<T>.freeList, original, clone);
            Copy(Fields<T>.lastIndex, original, clone);
            Copy(Fields<T>.version, original, clone);
        }

        return clone;
    }

    static void Copy<T>(FieldInfo field, HashSet<T> source, HashSet<T> target)
    {
        field.SetValue(target, field.GetValue(source));
    }

    static void Clone<T>(FieldInfo field, HashSet<T> source, HashSet<T> target)
    {
        field.SetValue(target, ((Array)field.GetValue(source)).Clone());
    }

    static class Fields<T>
    {
        public static readonly FieldInfo freeList = GetFieldInfo("m_freeList");
        public static readonly FieldInfo buckets = GetFieldInfo("m_buckets");
        public static readonly FieldInfo slots = GetFieldInfo("m_slots");
        public static readonly FieldInfo count = GetFieldInfo("m_count");
        public static readonly FieldInfo lastIndex = GetFieldInfo("m_lastIndex");
        public static readonly FieldInfo version = GetFieldInfo("m_version");
        public static readonly FieldInfo comparer = GetFieldInfo("m_comparer");

        static FieldInfo GetFieldInfo(string name)
        {
            return typeof(HashSet<T>).GetField(name, BindingFlags.Instance | BindingFlags.NonPublic);
        }
    }
}

Answer 3

我检查了版本4.5.2和版本4.7.2的.NET Framework源代码。版本4.7.2确实在构造函数中进行了优化，以使用一些内部克隆逻辑来处理传入的集合为HashSet类型的情况。您还需要将比较器传递到构造函数中，此逻辑才能起作用。似乎4.5.2版没有此优化。

示例：

var clonedSet = new HashSet(set, set.Comparer);

Answer 4

应该 的简单模式不适用于许多集合：

Class cloneableDictionary(Of T, U)
    Inherits Dictionary(Of T, U)
    Function clone() As Dictionary(Of T, U)
        Return CType(Me.MemberwiseClone, cloneableDict(Of T, U))
    End Function
End Class

不幸的是，我不知道微软做了什么阻止在不应该调用它的地方调用MemberwiseClone（例如声明一个方法以外的东西 - 比如可能是一个类 - 名为MemberwiseClone）所以我不知道如何判断这种方法是否有效。

我认为标准集合有一个合理的理由不支持公共克隆方法，而只支持受保护的方法：如果克隆的话，从集合派生的类可能会严重破坏，如果基类的克隆方法是public没有办法阻止派生类的对象被赋予期望克隆它的代码。

如果说过，如果.net包含了cloneableDictionary和其他类似的标准类型（，但显然不是基本上如上所述），那就不错了。

Answer 5

从理论上讲，O（n）克隆能够克隆两个不共享相同底层数据结构的集合。

检查元素是否在HashSet中应该是一个恒定时间（即O（1））操作。

所以你可以创建一个包装器，它只包装一个现有的HashSet并保留任何新增的内容，但这看起来很不合适。

当你说“效率”时，你的意思是“比现有的O（n）方法效率更高” - 我假设你实际上不能比O（n）更有效率而不玩关于'克隆'的非常严肃的语义游戏'意思是。

Answer 6

随便想一想。这可能很愚蠢。

由于他们没有实现ICloneable，并且构造函数不使用源是相同类型的知识，我想我们留下了一个选项。实现优化版本并将其作为扩展方法添加到类型中。

类似的东西：

namespace ExtensionMethods
{
    public static class MyExtensions
    {
        public static HashSet<int> Clone(this HashSet<int> original)
        {
            HashSet<int> clone = new HashSet<int>();
            //your optimized code here 
            return clone;
        }
    }   
}

然后，问题中的代码将如下所示：

HashSet<int> original = ...
HashSet<int> clone = HashSet<int>.Clone(original);

克隆HashSet的有效方法<t>？</t>

6 个答案: