我正在研究线性哈希的C ++实现。
简而言之,该结构按所谓的存储桶(数组)进行组织,每个存储桶都可以具有其溢出存储桶(该溢出也可以具有溢出等)。
在代码中,溢出桶如下所示:bucket * overflowBucket {nullptr}; //指向溢出桶和 bucket本身是用struct bucket {}制成的->每个都有其数组 固定大小的元素和溢出指针。
要在表中插入元素时,应检查元素是否已经存在,如果不存在,则应进行检查。但是,如果应在其中插入元素的存储桶已满,则应添加新的溢出,但在此之前,应该使用新的哈希函数重新填充“下一个拆分”存储桶。
到目前为止,我的程序无论如何都可以正确执行大多数操作,但是,当我将一个新元素插入其溢出已满的存储桶中时,而不是创建一个新的溢出并触发重新哈希分割桶,
它只会创建新的溢出。
与此相关的代码部分是: 插入元素的主要功能:
void insertElementInTable(const key_type& key) {
//compute the index of the bucket that the key should be inserted in
size_type start_idx{hashIndex(key)};
bool check = table[start_idx]->Full();
if(check == true) { //if the bucket is full, then...
//...rehash the nts bucket
rehashNextToSplit();
}
//then compute index again and insert the element in the bucket finally
size_type final_idx = hashIndex(key);
table[final_idx]->insertElementInBucket(key);
//then check if round needs to change and if yes then double the size of the table for future splits
if((nextToSplit-1) == roundNum) {
roundNum += 1;
nextToSplit = 0;
replaceOldTable(roundNum);
}
//increase the number of elements in the table
overallElementCount += 1;
}
功能齐全(我认为可能是问题所在,但我没有看到)
bool Full() {
size_type numBuckets{0}, numElements{0};
bucket* helper = this;
//counts all the elements in primary and all overflow buckets
while(helper != nullptr) {
numBuckets += 1;
for(unsigned i=0; i<N; ++i) {
if(helper->Bucket[i].state == State::taken) {
numElements += 1;
}
}
helper = helper->overflowBucket;
}
//set up real size and max size of bucket
size_type real_sz = numElements*numBuckets;
size_type max_sz = N*numBuckets;
//if the real size matches the max size, bucket is full and return true
return real_sz == max_sz;
}
拆分功能旁边的REHASH:
void rehashNextToSplit() {
//store the contents of nts bucket and its overflows in temporary vector
//COPY ELEMENTS AND CLEAR BUCKET
std::vector<value_type> vec;
vec = table[nextToSplit]->contentCopyAndClear();
//PROCEED WITH REHASHING
//first increment nts
nextToSplit += 1;
//then insert again the values from the vector
size_type sz = vec.size();
for(unsigned i=0; i<sz; ++i) {
size_type idx = hashIndex(vec.at(i));
table[idx]->insertElementInBucket(vec.at(i));
}
//when done free the memory used by the help vector
vec.clear();
vec.shrink_to_fit();
}
通过重新哈希方法调用的内容复制和清除功能:
std::vector<value_type> contentCopyAndClear() {
std::vector<value_type> vect;
bucket* ptr = this;
bool cond{true};
while (cond == true) { //goes through all buckets and copies the values into the vector
for(unsigned i=0; i<N; ++i) {
if(ptr->Bucket[i].state == State::taken) {
vect.push_back(ptr->Bucket[i].key); //copy element
ptr->Bucket[i].state = State::free; //free the elements
}
}
if(ptr->overflowBucket == nullptr) { cond = false; }
ptr = ptr->overflowBucket;
}
//reset ptr to first overflow
ptr = this->overflowBucket;
delete ptr;
return vect;
}
哈希函数:
size_type hashIndex(const key_type& key) const {
size_type idx = hasher{}(key) % (1<<roundNum);
if(idx < nextToSplit) {
size_type d{roundNum + 1};
idx = hasher{}(key) % (1<<d);
}
return idx;
}
再插入函数也调用insertElementInBucket:
void insertElementInBucket(const key_type& key) {
bucket* next = this;
while(true) {
for(unsigned i=0; i<N; ++i) {
if(next->Bucket[i].state == State::free) {
next->Bucket[i].key = key;
next->Bucket[i].state = State::taken;
return;
} else {
if(key_equal{}(next->Bucket[i].key, key)) {
return;
}
}
}
if(next->overflowBucket == nullptr) {
next->overflowBucket = new bucket;
}
next = next->overflowBucket;
}
}
由于文章的可读性和简单性,我不会在此处粘贴其余代码,但是您可以在以下链接中看到它:https://pastebin.com/TtdN5tBU
问题又总结了一次:
在下面的图片中插入数字17后,应将第一张表中的数字4重新映射(使用4 mod 2 ^ 3)到存储桶编号。 2
下面图片中的数字3和55只能输入一次(经过重新哈希处理后,它们应从1号存储桶中删除
对于任何提示,我将非常感谢,因为我一直试图在2天内弄清楚这一点,而现在没有任何进展。.