Question

我正在尝试优化需要很长时间的C ++代码的某些部分（对于X数据量，代码的以下部分需要大约19秒，并且我试图在少于5个时间内完成整个过程相同数量的数据的秒数 - 基于我的一些基准测试。我有一个函数“add”，我已经编写并复制了代码。我将尝试尽可能多地解释我认为需要理解代码。如果我错过了什么，请告诉我。

对于X数据条目，以下函数add被称为X次。

void HashTable::add(PointObject vector)   // PointObject is a user-defined object
{
    int combinedHash = hash(vector);   // the function "hash" takes less than 1 second for X amount of data

   // hashTableMap is an unordered_map<int, std::vector<PointObject>>

   if (hashTableMap.count(combinedHash) == 0)
   {
        // if the hashmap does not contain the combinedHash key, then 
        //  add the key and a new vector
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
   }
   else
   {
        // otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
        auto it = hashTableMap.find(combinedHash);
        if (it != hashTableMap.end())
        {
            std::vector<PointObject> pointVectorList = it->second;
            pointVectorList.push_back(vector);
            it->second = pointVectorList;
        }
   }
}

Answer 1

你正在做很多无用的操作......如果我理解正确，简化形式可能只是：

void HashTable::add(const PointObject& vector) {
   hashTableMap[hash(vector)].push_back(vector);    
}

这是因为

使用operator[]访问地图时，如果地图中尚未显示默认初始值，则会创建地图
值（std::vector）由引用返回，因此您可以直接push_back传入的点。如果密钥已经在地图中，则此std::vector将是新插入的密码或先前存在的密码。

另请注意，根据PointObject的大小和其他因素，按值传递vector而不是const PointObject&可能更有效。这是一种微观优化，但需要进行合理的分析。

Answer 2

最好只插入新元素并检查insert()返回的内容，而不是调用hashTableMap.count(combinedHash)和hashTableMap.find(combinedHash)：

在版本（1）和（2）中，该函数返回一个对象第一个元素是一个迭代器，指向新插入的 容器中的元素或其键等效的元素，以及表示元素是否成功的bool值插入与否。

此外，不要按价值传递对象，而不是必须的。最好通过指针或引用传递它。这样：

std::vector<PointObject> pointVectorList = it->second;

是低效的，因为它会创建一个不必要的向量副本。

Answer 3

如果没有if，请尝试在哈希表上插入一个空条目：

auto ret = hashTableMap.insert(
   std::make_pair(combinedHash, std::vector<PointObject>());

将添加新的空白条目，或者将检索已存在的条目。在您的情况下，您不需要检查它是什么情况，您只需要获取返回的迭代器并添加新元素：

auto &pointVectorList = *ret.first;
pointVectorList.push_back(vector);

Answer 4

此.count()完全不必要，您可以将功能简化为：

void HashTable::add(PointObject vector)
{
    int combinedHash = hash(vector);
    auto it = hashTableMap.find(combinedHash);
    if (it != hashTableMap.end())
    {
        std::vector<PointObject> pointVectorList = it->second;
        pointVectorList.push_back(vector);
        it->second = pointVectorList;
    }
    else
    {
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
    }
}

您还在各处执行复制操作。复制对象非常耗时，避免这样做。在可能的情况下也使用引用和指针：

void HashTable::add(PointObject& vector)
{
    int combinedHash = hash(vector);
    auto it = hashTableMap.find(combinedHash);
    if (it != hashTableMap.end())
    {
        it->second.push_back(vector);
    }
    else
    {
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
    }
}

此代码可能会进一步优化，但需要了解hash()，知道hashTableMap的工作方式（顺便说一句，为什么它不是std::map？）和一些实验

如果hashTableMap是std::map<int, std::vector<pointVectorList>>，您可以将功能简化为：

void HashTable::add(PointObject& vector)
{
    hashTableMap[hash(vector)].push_back(vector);
}

如果它是std::map<int, std::vector<pointVectorList*>>（指针），你甚至可以避免最后一次复制操作。

Answer 5

你的最大问题是你在else部分中复制整个矢量（以及该矢量中的每个元素）两次：

std::vector<PointObject> pointVectorList = it->second;  // first copy
pointVectorList.push_back(vector);
it->second = pointVectorList;                           // second copy

这意味着每次向现有向量添加元素时，都会复制整个向量。

如果您使用了对该向量的引用，那么您可以做得更好：

std::vector<PointObject> &pointVectorList = it->second;
pointVectorList.push_back(vector);
//it->second = pointVectorList; // don't need this anymore.

在旁注中，在unordered_map中，您正在将您的价值作为关键。您可以使用带有哈希函数的unordered_set。

Answer 6

使用std::unordered_map似乎不合适 - 您使用int中的hash作为密钥（可能）是PointObject的散列而不是{{ 1}}本身。基本上是双重哈希。而且如果你需要一个PointObject来计算地图密钥，那么它根本就不是一个密钥！也许PointObject可能是更好的选择？

首先定义哈希函数形式std::unordered_multiset

PointObject

然后像

namespace std
{
    template<>
    struct hash<PointObject> {
        size_t operator()(const PointObject& p) const {
            return ::hash(p);
        }
    };
}

Answer 7

假设PointObject很大并且制作副本很贵，std::move就是你的朋友。您需要确保PointObject是移动感知的（要么不定义析构函数或复制运算符，要么自己提供移动构造函数和移动赋值运算符）。

void HashTable::add(PointObject vector)   // PointObject is a user-defined object
{
    int combinedHash = hash(vector);   // the function "hash" takes less than 1 second for X amount of data

   // hashTableMap is an unordered_map<int, std::vector<PointObject>>

   if (hashTableMap.count(combinedHash) == 0)
   {
        // if the hashmap does not contain the combinedHash key, then 
        //  add the key and a new vector
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(std::move(vector));
        hashTableMap.insert(std::make_pair(combinedHash, std::move(pointVectorList)));
   }
   else
   {
        // otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
        auto it = hashTableMap.find(combinedHash);
        if (it != hashTableMap.end())
        {
            std::vector<PointObject> pointVectorList = it->second;
            pointVectorList.push_back(std::move(vector));
            it->second = std::move(pointVectorList);
        }
   }
}

优化C ++代码（使用UnorderedMap和Vector）

7 个答案: