Question

我已经创建了一个unit64_t到uint64_t的地图。这是我为评估空间复杂性而编写的代码：

#include <bits/stdc++.h>
#include "sparsehash/internal/sparseconfig.h"
#include "sparsehash/sparse_hash_map"

using namespace std;

int main(int argc, char *argv[]){

    std::string input,reference;

    while (getline(cin,input)) {
    reference += input;
    input.clear();
    }

    cout<<"length of reference = "<<reference.length()<<endl;
    unordered_map<uint64_t, uint64_t> m;
    //google::sparse_hash_map<uint64_t, pair<short,long>> m;

    for (auto it = reference.begin(); it != reference.end(); it++) {
        m[it-reference.begin()]= it-reference.begin();
    }

    return 0;
}

当我用/ usr / bin / time运行时，这是程序产生的输出：

length of reference = 4641652
    Command being timed: "./a.out"
    User time (seconds): 2.97
    System time (seconds): 0.15
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.13
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 251816
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 68259
    Voluntary context switches: 1
    Involuntary context switches: 104
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

无序地图似乎占用了250MB的空间。这似乎非常高。为什么会这样呢？使用google sparse hash的相同代码只需要89MB的空间，这更合理。

我不明白为什么C ++无序地图占用了这么多空间？

Answer 1

您有4641652个条目。因此，原始数据总大小为4641652*2*8 byte ~= 74 MB。

哈希表有一个重要的事实。快速哈希表有很多哈希桶，而哈希表的哈希表很少。

它基本上都归结为哈希冲突。如果你有很多散列桶（并且你有一个很好的散列函数），那么很少发生散列碰撞。因此查找真的很快。在另一方面，如果您的表很小（不是很多哈希桶），那么哈希冲突会定期发生。因此查找功能要慢得多。

现在std::unordered_map被称为快速哈希表，因此它有很多开销。哈希桶比条目多得多。在这种情况下，开销约为250 / 74 ~= 3.3x，这似乎很正常。

但是sparsehash被设计为具有尽可能少的开销（每个条目大约~2位）。但当然这意味着速度要慢得多。

如果你使用哈希映射，你应该总是考虑，如果你想要速度，或者你想要内存效率。

无序的地图占用了大量的空间

1 个答案: