Question

下面是我使用＆＃34; buckets＆＃34;的哈希表的实现。用于碰撞检测。我试图确保我能够完全理解哈希表背后的逻辑并将其可视化。这就是我的情况下哈希表的样子：

[[[]＆lt; ---- tuple]＆lt; --- bucket，[]＆lt; ---- bucket]＆lt; - storage

元组中的键值对基于散列函数的输出即，即桶索引放置在桶中。进入散列桶索引后，将键值对放在桶中。如果有什么东西匹配那个确切的密钥（一旦进入桶中），它就会在我的实现中被覆盖。当您找到与以前相同的密钥时，会发生碰撞检测，并覆盖其值。

我可以做一些不同的事情 - 可能会在元组末尾添加不同值的键（而不是覆盖值）或键必须始终是唯一的吗？是否只有这些值需要唯一的情况？

    var makeHashTable = function() {



     var max = 4;

      return {
        _storage: [],
    retrieve: function(key) {
      //when we retrieve, we want to access the bucket in the same way as insert, but we don't need to set it to storage since that is already taken
      //care of in the insert function.  if there's nothing in the bucket, the loop won't run.
      //this function will return null by default.
      var bucketIndex = hashFn(key, max);
      var bucket = this._storage[bucketIndex];

      for (var i = 0; i < bucket.length; i++) {
        var tuple = bucket[i];
        if (tuple[0] === key) {
          return tuple[1];
        };
      };
      return null;
    },

    insert: function(key, value) {
      //hash function gives you the right index
      var bucketIndex = hashFn(key, max)
        //where you need to put the bucket. if there's no bucket, initialize it.
      var bucket = this._storage[bucketIndex] || [];
      //now you need to actually store the bucket there. 
      this._storage[bucketIndex] = bucket;
      //implement a collission detection scheme whereby you overwrite the respective value if the key matches, otherwise, add it to the end. 
      // the for loop won't execute if there is nothing in the bucket if so, jump to line 45 instead
      // here is what's happening. If the key doesn't already exist, the key value pair gets added to the end of the bucket.
      // if the key matches, IT MUST BE THE SAME VALUE that the hashed key associated with that value previously, so, overwrite it. 
      for (var i = 0; i < bucket.length; i++) {
        var tuple = bucket[i];
        if (tuple[0] === key) {
          tuple[1] = value;
          return;
        };
      };



      bucket.push([key, value]);
    }

  };
};


HashTable.prototype.remove = function(key) {
  var bucketIndex = hashFn(key, max);
  var bucket = this._storage[bucketIndex];

  for (var i = 0; i < bucket.length; i++) {
    var tuple = bucket[i];
    if (tuple[0] === k) {
      bucket.splice(i, 1);
    };
  };

};

//don't worry about this generic hashing function please, not the point of my question
var hashFn = function(str, max) {
  var hash = 0;
  for (var i = 0; i < str.length; i++) {
    var letter = str[i];
    hash = (hash << 5) + letter.charCodeAt(0);
    hash = (hash & hash) % max;
  }
  return hash;
};

Answer 1

您的哈希表实现是正确的。我应该指出，您在问题中描述的内容不是碰撞检测，而是使用新值更新密钥的操作。冲突是指两个不同的键映射到同一个存储桶时，而不是在插入密钥时发现存在具有相同密钥的先前条目时。您已经在同一个存储桶中通过chaining条目处理冲突。

无论如何，您已经开始正确更新条目了。假设您已将（键，值）对（'a'，'ant'）插入哈希表中。这意味着'a'映射到'ant'。如果你插入（'a'，'aardvark'），目的是覆盖'a'条目，使它现在映射到'aardvark'。因此，您遍历条目链并检查存储桶中的密钥“a”。你找到它，所以你用“aardvark”替换值'ant'。现在'a'映射到'aardvark'。好。

假设您没有迭代条目链。如果你盲目追加（'a'，'aardvark'）到链的末尾会发生什么？结果是，当你查找键'a'并且你通过桶中的条目时，你先得到（'a'，'ant'），所以你返回'ant'。这是一个不正确的结果。你最近插入了（'a'，'aardvark'），所以你应该返回'aardvark'。

啊，但如果你总是从头到尾开始搜索链怎么办？换句话说，你将它视为一个堆栈。要插入条目，请将其推送到链的末尾。要查找密钥，请从头开始搜索。具有给定键的第一个条目是最近插入的条目，因此它是正确的条目，您可以返回该值而无需进一步搜索。

这种实施方式是正确的，但它也会使链条长度超过必要的时间。考虑如果您使用哈希表来计算文本文件中的字母频率会发生什么。最初在表中插入（'a'，0）。当您在文本中找到第一个'a'时，您从表中读取0，向其中添加1，并在哈希表中插入（'a'，1）。现在，链中有两个带有键'a'的条目，只有靠近末尾的条目才有效。当您找到下一个'a'时，第三个条目将添加到链中，依此类推。数千个具有相同密钥的插入导致链中的数千个条目。

这不仅会耗尽内存，还会降低其他键插入的速度。例如，假设您的哈希函数为键'a'和'q'分配相同的索引。这意味着'q'条目与'a'条目位于同一个桶中。如果您在链的末尾有一大堆'a'条目，则在找到带有'q'的最新条目之前，您可能必须经过其中的许多条目。出于这些原因，做你做的最好。

还有一个想法：如果每个条目都是一个元组（键，值），那么 values 是一个值数组？然后，按照您的建议，您可以在发生密钥冲突时将新值附加到值的末尾。但是，如果你这样做，值的含义是什么？它包含使用该键插入的值，按插入时间的顺序排列。如果将其视为堆栈并始终返回列表中的最后一个值，则会浪费空间。您也可以覆盖一个值。

是否有一种情况可以让您将新值放入存储桶而不检查现有密钥？是的，如果你有一个perfect hash function，你就可以做到这一点，这可以保证没有碰撞。每个密钥都映射到不同的桶。现在您不需要一系列条目。每个存储桶中最多只有一个值，因此您可以将哈希表实现为一个数组，该数组在每个索引处包含undefined或该索引处最近插入的值。这听起来不错，除非要想出一个完美的哈希函数是不容易的，特别是如果你希望你的哈希表不包含超过必要的桶。您必须事先知道可能使用的每个可能的密钥，以便设计一个哈希函数，将n个可能的密钥映射到n个不同的桶。

Answer 2

哈希表中的冲突通常通过让每个键实际代表一个数组（或最符合您需求的任何数据结构）来处理。这样，当您有两个具有相同键的值时，您只需将其推入与该键对应的数组中，之后您只需搜索该数组中的元素即可。这通常不是问题，因为它仍然比搜索整个哈希表好得多。

如果数组中只有一个元素，则仍需要一段时间才能找到该元素。

了解哈希表和冲突检测

2 个答案: