Question

我正在为我的一个实用程序设计一种数据结构，我很想做一个哈希表，其中的键是一个很长的字符串，特别是文件路径。从数据的角度来看，这样做有很多原因，主要是保证路径唯一的事实。也就是说，我看到的每个哈希表示例都有非常短的键和可能较长的值。因此，我想知道这是否只是简单示例的功能？还是出于性能或技术原因不使用长键？如果有任何区别，我将使用$variable = New-Object Collections.Specialized.OrderedDictionary进行版本不可知排序。

Answer 1

我认为您可以使用长字符串的键。

在后台，OrderedDictionary中的关键查找是在

中进行的

if (objectsTable.Contains(key)) {

objectsTable的类型为Hashtable

如果遵循Hashtable类中获取哈希的链，您将获得以下信息： https://referencesource.microsoft.com/#mscorlib/system/collections/hashtable.cs,4f6addb8551463cf

    // Internal method to get the hash code for an Object.  This will call
    // GetHashCode() on each object if you haven't provided an IHashCodeProvider
    // instance.  Otherwise, it calls hcp.GetHashCode(obj).
    protected virtual int GetHash(Object key)
    {
        if (_keycomparer != null)
            return _keycomparer.GetHashCode(key);
        return key.GetHashCode();
    }

因此，问题变成了，在字符串上获取HashCode的代价是什么？ https://referencesource.microsoft.com/#mscorlib/system/string.cs

您将看到函数GetHashCode是一个循环，但是它只是一个O（n）函数，因为它仅基于字符串长度而增长。您会注意到，在32位计算机上，哈希计算与在其他计算机上的计算有所不同，但是O（n）对于扩展算法来说是更糟糕的情况。

该函数还有其他部分，但是我认为这是关键部分，因为它是可以增长的部分（src是char *，表示指向字符串中的字符）。

#if WIN32
                    // 32 bit machines.
                    int* pint = (int *)src;
                    int len = this.Length;
                    while (len > 2)
                    {
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                        hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
                        pint += 2;
                        len  -= 4;
                    }

                    if (len > 0)
                    {
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                    }
#else
                    int     c;
                    char *s = src;
                    while ((c = s[0]) != 0) {
                        hash1 = ((hash1 << 5) + hash1) ^ c;
                        c = s[1];
                        if (c == 0)
                            break;
                        hash2 = ((hash2 << 5) + hash2) ^ c;
                        s += 2;
                    }
#endif

具有长（超过100个字符）键名的哈希表

1 个答案: