Question

我有两个v1类型的向量v2和std::vector<std::string>。两个向量都有唯一的值，如果两个向量的值相等，则它们应该相等，但与向量中出现的顺序值无关。

我假设最好使用两组std::unordered_set类型的集，但我还是这样，所以有两个向量。

尽管如此，我认为对于所需的不区分顺序的比较，我将通过复制到两个operator==中来使用std::unordered_set中的std::unordered_set。非常像这样：

bool oi_compare1(std::vector<std::string> const&v1,
                 std::vector<std::string> const&v2)
{
    std::unordered_set<std::string> tmp1(v1.begin(),v1.end());
    std::unordered_set<std::string> tmp2(v2.begin(),v2.end());
    return tmp1 == tmp2;
}

在进行概要分析时，我注意到此函数消耗大量时间，因此我检查了文档并看到了O(n*n)的复杂性。我很困惑，我期待O(n*log(n))，例如对于以下天真的解决方案，我想到了：

bool oi_compare2(std::vector<std::string> const&v1,
                 std::vector<std::string> const&v2)
{
    if(v1.size() != v2.size())
        return false;
    auto tmp = v2;
    size_t const size = tmp.size();
    for(size_t i = 0; i < size; ++i)
    {
        bool flag = false;
        for(size_t j = i; j < size; ++j)
            if(v1[i] == tmp[j]){
                flag = true;
                std::swap(tmp[i],tmp[j]);
                break;
            }
        if(!flag)
            return false;
    }
    return true;
}

为什么O(n*n)的{{1}}复杂度以及我可以用于订单不敏感比较的内置函数？

编辑---- 基准

std::unordered_set

给予

#include <unordered_set>
#include <chrono>
#include <iostream>
#include <vector>

bool oi_compare1(std::vector<std::string> const&v1,
        std::vector<std::string> const&v2)
{
    std::unordered_set<std::string> tmp1(v1.begin(),v1.end());
    std::unordered_set<std::string> tmp2(v2.begin(),v2.end());
    return tmp1 == tmp2;
}
bool oi_compare2(std::vector<std::string> const&v1,
                std::vector<std::string> const&v2)
{
    if(v1.size() != v2.size())
        return false;
    auto tmp = v2;
    size_t const size = tmp.size();
    for(size_t i = 0; i < size; ++i)
    {
        bool flag = false;
        for(size_t j = i; j < size; ++j)
            if(v1[i] == tmp[j]){
                flag = true;
                std::swap(tmp[i],tmp[j]);
                break;
            }
        if(!flag)
            return false;
    }
    return true;
}

int main()
{
    std::vector<std::string> s1{"1","2","3"};
    std::vector<std::string> s2{"1","3","2"};
    std::cout << std::boolalpha;
    for(size_t i = 0; i < 15; ++i)
    {
        auto tmp1 = s1;
        for(auto &iter : tmp1)
            iter = std::to_string(i)+iter;
        s1.insert(s1.end(),tmp1.begin(),tmp1.end());
        s2.insert(s2.end(),tmp1.begin(),tmp1.end());
    }
    std::cout << "size1 " << s1.size() << std::endl;
    std::cout << "size2 " << s2.size() << std::endl;

    for(auto && c : {oi_compare1,oi_compare2})
    {
        auto start = std::chrono::steady_clock::now();
        bool flag = true;
        for(size_t i = 0; i < 10; ++i)
            flag = flag && c(s1,s2);
        std::cout << "ms=" << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - start).count() << " flag=" << flag << std::endl;
    }
    return 0;
}

->天真的方法更快。

对于这里的所有复杂性O（N * N）专家... 让我经历一下这种幼稚的方法。我在那里有两个循环。第一个循环从size1 98304 size2 98304 ms=844 flag=true ms=31 flag=true运行到N的大小。内部循环从j = i !!!!!!调用。到N。用语言来说，这意味着我叫内循环N次。但是由于j = i !!!!的起始索引，内部循环的复杂度为log（n）。如果您仍然不相信我可以根据基准计算复杂度，那么您会看到...

编辑2 --- 现场直播 https://wandbox.org/permlink/v26oxnR2GVDb9M6y

Answer 1

由于unordered_set是使用hashmap构建的，因此比较lhs == rhs的逻辑将是：

检查lhs和rhs的大小，如果不相等，则返回false
对于以lhs为单位的每个项目，以rhs为单位进行查找，然后进行比较

对于哈希映射，最坏情况下以rhs为单位的项的单个查找时间复杂度将为O（n）。因此，最坏情况下的时间复杂度将为O（n ^ 2）。但是通常情况下，您的时间复杂度为O（n）。

Answer 2

很抱歉告诉您，您对operator==的基准测试有误。

oi_compare1接受2个向量，并需要构建2个完整的unordered_set实例，然后调用operator==并再次销毁完整的束。

oi_compare2也接受2个向量，并立即将它们用于大小比较。仅复制1个实例（从v2到tmp），这对于矢量而言性能更高。

operator ==

查看文档：{{3}}，我们可以看到预期的复杂性：

与N成比例的调用value_type上的operator ==，调用key_eq返回的谓词，以及调用hash_function返回的哈希器，在通常情况下，与最坏情况下的N2成正比，其中N是容器。

修改有一个简单的算法，您可以遍历unordered_set并在另一个中进行简单查找。没有哈希冲突，它将在自己的内部存储桶中找到每个元素，并比较它们的相等性，因为哈希不够。

假设您没有哈希冲突，unordered_set的每个元素都有一个稳定的存储顺序。一个人可以在内部存储桶上循环，然后将元素2比2比较（一个元素的第一个与第二个元素的第一个，第二个元素与第二个元素的第二个……）。很好地给出了O(N)。 当存储值的存储桶大小不同时，或者存储桶的分配使用不同的计算来处理冲突时，这将不起作用。

假设您很不幸，并且每个元素都会产生相同的哈希值。（称为hash flooding）将导致元素列表不按顺序排列。为了进行比较，您必须检查每个元素是否存在于另一个元素中，从而导致O(N*N)。

如果您将哈希值固定为始终返回相同的数字，则最后一个易于复制。以与另一组相反的顺序构建一组。

为什么std :: unordered_set运算符==（）N ^ 2的复杂性？

2 个答案:

operator ==