Question

我就是这样，我收集了几个包含指针的vector个。指针没有特别的顺序。我想提出一种检查向量是否可能包含相同指针的方法。

我想出将指针解释为整数并将它们相加的想法。如果两个向量的和相同，则包含的指针也必须相同。它运作良好，我没有看到任何问题。但是，在某些情况下，这种想法会发生碰撞并返回误报（当它们实际上不同时报告相等的向量）。

我的问题是，如果有办法绕过这次碰撞？

注意：不能选择对矢量进行排序。

编辑：在我的应用程序中，我可以有许多这样的指针向量。然后一个人加入了这个集合（可能是1000个向量）。当发生这种情况时，我必须能够检查其他一些载体是否已覆盖相同的元素。如果是这样，新人就会被抛弃。为了跟踪集合中已经存在哪些指针 - 向量，我现在使用std::set（我的实际PtrHasher支持的运算符多于此处所示的运算符）。因此，检查唯一性所需的操作是1）对所有指针进行线性求和，2）在恒定时间内检查该集合。

正如我的评论所写，我的应用程序可以处理“某些”误报（即使它尚未涵盖，也会丢弃一个向量）。因此，求和对我有用。我之所以提出这个问题的原因是，如果有其他方法（或更好的操作）可以进一步减少误报，但会提供相同的性能。

早期的实施也使用std::set进行“覆盖检查”，而且性能要差得多。

这是我的代码：

#include <iostream>
#include <vector>
#include <stdint.h> // std::uintptr_t

using namespace std;

template<typename T>
class PtrHasher
{
public:
    PtrHasher(vector<T> v) : hash(0) { 
        for(const auto i : v)
            add(i);
    }
    void add(T pointer) {
        hash += reinterpret_cast<uintptr_t>(pointer);
    }
    bool operator ==(const PtrHasher<T>& other) const {
        return hash == other.hash;
    }
private:
    uintptr_t hash;
};


int main() {

    vector<int> values{0,1,2,3,4};
    vector<int*> ptr1{ &values.at(0), &values.at(2), &values.at(4) }; // points to 0,2,4
    vector<int*> ptr2{ &values.at(4), &values.at(0), &values.at(2) }; // points to 4,0,2 i.e. same positions
    vector<int*> ptr3{ &values.at(4), &values.at(3), &values.at(2) }; // points to 4,3,2 i.e. not quite the same position

    PtrHasher<int*> hasher1(ptr1);
    PtrHasher<int*> hasher2(ptr2);
    PtrHasher<int*> hasher3(ptr3);

    cout<< (hasher1==hasher2) <<endl;
    cout<< (hasher1==hasher3) <<endl;
    cout<< (hasher2==hasher3) <<endl;

    return 0;
}

Answer 1

即使两个向量包含不同的指针，总和也可以相同，例如，向量A包含{p1，p2}，向量B包含{p1 + 8，p2-8}。如果没有其他属性可以依赖，将向量转换为地图进行比较可能是一种解决方案。

bool compare(vector<int*> ptr1, vector<int*> ptr2)
{
    map <int*, bool> mapForPtr1;
    for each elememt in ptr1
    {
      mapForPtr1[element] = true;
    }

    for each element in ptr2
    {
        if (mapForPtr1[element] != true)
            return false;
    }

    return true;
}

从N到LogN，复杂性略高。但它比一般的排序要快一些。

Answer 2

这是我最终提出的。我不是仅仅添加指针，而是使用随机数引擎生成并生成这样的数字。由于种子总是被重置为指针的值，所以相同的指针生成相同的随机数，但具有几乎相同地址的邻居指针生成非常不同的数字。请注意，这仍然不是100％保存，但它对我的目的来说很好。

/// Class for hashing ranges of pointers, such that they can be compared to a different hasher for containing (all) the same pointers, independent of their order.
template<typename T>
class PointerCollectionHash
{
public:
    /// Construct a hasher.
    PointerCollectionHash()
        : m_sum(0),
          m_generator(0)
    {
        assert( std::is_pointer<T>::value && "ERROR: must be pointer type.");
    }

    /// Hashes each element within a range and adds it. last is the past-the-end item.
    template<typename Iter>
    void add(Iter first, Iter last)
    {
        for(; first!=last; std::advance(first, 1))
            add(*first);
    }

    /// Hashes a pointer and adds it.
    void add(T pointer)
    {
      m_sum += hash(pointer);
    }

    /// Compares two hasher. Returns true if all their hashed pointers are equal, independent of order of hashing. 
    bool operator ==(const PointerCollectionHash<T>& other) const
    {
        return m_sum == other.m_sum;
    }

private:
    /// Hashes a pointer.
    std::uintptr_t hash(T pointer)
    {
      m_generator.seed( reinterpret_cast<std::uintptr_t>(pointer) );
      return m_generator();
    }

    /// Keeps the sum of the added pointers.
    std::uintptr_t m_sum;
    /// Use Mersenne Twister to obtain a "unique as possible"-hash for an given input. The Seed of the engine is set to the input and a number is generated.
    std::mt19937_64 m_generator;
};

这是一个好主意：从向量求和指针的类，使得相等比较快

2 个答案: