Question

我使用unordered_map作为稀疏3D数组（128 x 128 x 128）将值插入网格，前提是网格单元格仍然是空闲的。

到目前为止，我总是使用find（）进行检查，如果单元格是空闲的，如果是，那么我已经使用insert（）或emplace（）添加了一个元素。现在我发现我可以使用insert和emplace的返回值来检查元素是否已被添加，或者是否已经有一个元素在地图中具有相同的键。我认为这可以提高性能，因为我可以完全删除find的用法。

事实证明，不是通过插入而不是通过插入来提高性能，而是性能实际上降低了，我不确定原因。

我已将我的应用程序缩减为此示例，其中随机生成点然后将其插入网格中。

#include <unordered_map>
#include <random>
#include <chrono>
#include <iostream>
#include <math.h>
#include <algorithm>
#include <string>

using std::cout;
using std::endl;
using std::chrono::high_resolution_clock;
using std::chrono::milliseconds;
using std::chrono::duration_cast;
using std::unordered_map;

int num_elements = 5'000'000;


void findThenInsert(){
    cout << endl << "find and emplace" << endl;

    auto start = high_resolution_clock::now();

    std::mt19937 gen(123);
    std::uniform_real_distribution<> dis(0, 128);

    unordered_map<int, int> grid;
    int count = 0;
    for(int i = 0; i < num_elements; i++){
        float x = dis(gen);
        float y = dis(gen);
        float z = (cos(x*0.1) * sin(x*0.1) + 1.0) * 64.0;

        int index = int(x) + int(y) * 128 + int(z) * 128 * 128;
        auto it = grid.find(index);
        if(it == grid.end()){
            grid.emplace(index, count);
            count++;
        }
    }

    cout << "elements: " << count << endl;
    cout << "load factor: " << grid.load_factor() << endl;

    auto end = high_resolution_clock::now();
    long long duration = duration_cast<milliseconds>(end - start).count();
    float seconds = duration / 1000.0f;
    cout << seconds << "s" << endl;
}


void insertThenCheckForSuccess(){
    cout << endl << "emplace and check success" << endl;

    auto start = high_resolution_clock::now();

    std::mt19937 gen(123);
    std::uniform_real_distribution<> dis(0, 128);

    unordered_map<int, int> grid;
    int count = 0;
    for(int i = 0; i < num_elements; i++){
        float x = dis(gen);
        float y = dis(gen);
        float z = (cos(x*0.1) * sin(x*0.1) + 1.0) * 64.0;

        int index = int(x) + int(y) * 128 + int(z) * 128 * 128;
        auto it = grid.emplace(index, count);
        if(it.second){
            count++;
        }
    }

    cout << "elements: " << count << endl;
    cout << "load factor: " << grid.load_factor() << endl;

    auto end = high_resolution_clock::now();
    long long duration = duration_cast<milliseconds>(end - start).count();
    float seconds = duration / 1000.0f;
    cout << seconds << "s" << endl;
}

int main(){

    findThenInsert();
    insertThenCheckForSuccess();

}

在这两种情况下，之后地图的大小为82901，因此我假设结果完全相同。

find and emplace:   0.937s
emplace then check: 1.268s

Answer 1

问题在于，关联容器的emplace规范实际上需要在失败的情况下进行分配;这种分配和重新分配的成本主导了find-then-insert策略中失败探测的成本。

这是因为emplace被指定为从其转发的参数中构造value_type（即pair<Key const, T>）;只有在构造了对之后，它才能对密钥进行散列以检查密钥是否已经存在。（它不能只接受第一个参数，因为它可能是std::piecewise_construct。）它也无法在自动存储中构造pair然后将其移动到节点中，因为{{1}未指定要求可复制或甚至可移动的emplace，因此它必须在每次调用时执行可能昂贵的节点分配。（请注意，有序关联容器具有相同的问题，但是与分配成本相比，探测的O（log n）成本更加重要。）

除非在大多数情况下预期插入成功，否则最好使用find-then-emplace而不是emplace-then-test。您也可以使用value_type，只要您确保调用insert重载而不是转发到value_type的模板。

这可能（可能）在C ++ 17中得到修复，它应该具有try_emplace，具有类似的语义，但在故障情况下提高了性能。（语义上的区别在于映射类型在失败的情况下不是emplace构造的;这使得例如将emplace存储为映射类型成为可能。）

Answer 2

我认为问题在于您使用的是emplace而不是insert。问题是关联容器中的emplace函数通常为节点分配内存，即使密钥已经存在。因此，如果您经常使用重复项，那么这些内存分配就会被浪费掉。如果您使用了insert，那么只有在插入成功时才会进行内存分配。

Scott Meyers says只更喜欢emplace函数而非插入函数，如果＆＃34;容器不会因为它是重复而拒绝添加的值＆＃34;

我无法完全重现您的搜索结果，但my testing显示插入（不是安卓），然后测试甚至比查找更快地进行安装：

auto it = grid.insert({index, count});

此决定还可能取决于创建价值类型的成本。 find不需要构造值类型，它只需要键。但是emplace和insert需要键和值类型，因此在创建值的代价很高的情况下，使用find可能会更快，只有在需要时才创建值。在这种情况下，您的价值仅为int，因此我希望insert或emplace始终赢得find-then-emplace。

为什么unordered_map＆＃34; find + insert＆＃34;快于＆＃34;插入+检查是否成功＆＃34;？

2 个答案: