Question

我需要实现一个巨大的哈希表，它支持多个线程同时插入和获取。键是int，第二个元素是对象T的向量。

class T { 
        //class definitions here
}

目前tbb :: concurrent_unordered_map有助于实现。在文档上，它似乎允许同时插入和遍历。但是，运行多个pthreads会导致分段错误，尽管顺序测试非常好。请注意，肯定没有擦除或弹出操作，即只允许散列表增长。

std::vector<T*> get(int key) {
        //Note that the hash table hashTable is shared by multiple threads
        tbb::concurrent_unordered_map<int, std::vector<T*>>::iterator it = hashTable.find(key);
        if (it != hashTable.end())
                return it->second;
        else {
                std::vector<T*> newvector;
                return newvector;
        }
}

void insert(int key, T *t) {
        tbb::concurrent_unordered_map<int, std::vector<T*>>::iterator it = hashTable.find(key);
        if (it != hashTable.end())
                it->second.push_back(t);
        else {
                std::vector<T*> newTs;
                newTs.push_back(t);
                hashTable.insert(it, makepair(key, newTs));
        }
}

为了调试发生的事情，我首先将std :: vector更改为tbb :: concurrent_vector，然后按如下方式修改函数get（）和insert（）。

std::vector<T*> get_test(int key) {
        std::vector<T*> test;
        //Note that the hash table hashTable is shared by multiple threads
        tbb::concurrent_unordered_map<int, tbb::concurrent_vector<T*>>::iterator it = hashTable.find(key);
        if (it != hashTable.end())
                test.insert(test.end(), it->second.begin(), it->second.end());
        for (T* _t : test)
                if (!_t)
                        printf("Bug happens here!\n"); //Segfault is originated here because a NULL is returned in the vector  
        return test;
}

void insert_test(int key, T *t) {
        //Here t is guaranteed to be not NULL
        if(!t)
                std::terminate();
        tbb::concurrent_unordered_map<int, tbb::concurrent_vector<T*>>::iterator it = hashTable.find(key);
        if (it != hashTable.end())
                it->second.push_back(t);
        else {
                std::vector<T*> newTs;
                newTs.push_back(t);
                hashTable.insert(it, makepair(key, newTs));
        }
}

现在我们可以看到并行程序崩溃的原因是在get_test（）函数中返回了一些NULL指针。回想一下，在insert_test（）函数中，NULL永远不会作为第二个元素插入。

以下是要问的问题。

（1）我从某处读到tbb :: concurrent_unordered_map中的并发插入可能导致某些插入尝试失败，然后销毁临时对象。这是在get_test（）函数中观察到NULL的原因吗？

（2）TBB是否真的可以同时插入和遍历？这意味着当一些线程插入时，其他线程可能同时调用get（）。

（3）是否有更好的实现，即支持并发插入和读取的更好的并发哈希表？

正如@for_stack建议的那样，我已经验证了第二个元素是concurrent_vectors，整个程序是可运行的。进一步的测试如下：

tbb::concurrent_vector<T*> get_test(int key) {
            tbb::concurrent_vector<T*> test;
            //Note that the hash table hashTable is shared by multiple threads
            tbb::concurrent_unordered_map<int, tbb::concurrent_vector<T*>>::iterator it = hashTable.find(key);
            if (it != hashTable.end())
                    test = it->second;
            int i = 0;
            for (T* _t : test)
                    if (!_t)
                            i++;
            if (i > 0)
                    printf("%d of %lu Ts are NULL\n", i, test.size()); //Segfault is originated here because a NULL is returned in the vector  
            return test;
}

void insert_test(int key, T *t) {
        //Here t is guaranteed to be not NULL
        if(!t)
                std::terminate();
        tbb::concurrent_unordered_map<int, tbb::concurrent_vector<T*>>::iterator it = hashTable.find(key);
        if (it != hashTable.end())
                it->second.push_back(t);
        else {
                tbb::concurrent_vector<T*> newTs;
                newTs.push_back(t);
                hashTable.insert(it, make_pair(key, newTs));
        }
}

输出

1 of 2 Ts are NULL

这意味着并非get（）中返回的所有对象（T）都为NULL。

再次顺序（甚至1个线程）运行也没问题。

Answer 1

~~TBB CAN 支持concurrent_xxx容器的并发插入和遍历。~~但是，您的原始代码具有竞争条件：

std::vector<T*> get(int key) {
    // other code
    return it->second;  # race condition 1
    // other code
}

get函数尝试返回vector<T*>（读取）的副本，而其他线程可能会调用insert来修改vector<T*> （写）。

void insert(int key, T *t) {
    // other code
    it->second.push_back(t);   # race condition 2
    // other code
}

insert函数尝试在没有锁定保护的情况下修改vector<T*>。如果有多个线程同时调用insert（多次写），那么OOPS！

concurrent_unordered_map仅对容器操作有安全保障，但不保证mapped_value上的操作。你必须自己做。

正如您尝试过的那样，您可以将vector<T*>替换为concurrent_vector<T*>。但是，您发布的新代码无法编译，您必须修改insert_test的实现：

void insert_test(int key, T *t) {
    //Here t is guaranteed to be not NULL
    if(!t)
            std::terminate();
    tbb::concurrent_unordered_map<int, tbb::concurrent_vector<T*>>::iterator it = hashTable.find(key);
    if (it != hashTable.end())
            it->second.push_back(t);
    else {
            // std::vector<T*> newTs;   # this is wrong!
            tbb::concurrent_vector<T*> newTs;
            newTs.push_back(t);
            hashTable.insert(it, make_pair(key, newTs));  // it should be make_pair not makepair
    }
}

Answer 2

“TBB CAN支持并发插入和遍历concurrent_xxx容器。” - 不完全是。当TBB中没有内存回收支持并且容器（concurrent_hash_map）支持并发擦除时，遍历是一件棘手的事情。但是，concurrent_unordered_map不支持线程安全erase()，因此支持线程安全遍历。

Answer 3

@Anton我的朋友，concurrent_unordered容器确实支持并发遍历和插入;它们被实现为跳过列表。在非多情况下，测试指针摆动的结果，如果失败，则从插入点再次开始搜索。

现在，自从我在英特尔工作以来，C ++可能在过去几周内发生了变化，但我认为原始代码存在严重错误：

if (it != hashTable.end())
        return it->second;          // return a copy???
else {
        std::vector<T*> newvector;  // this is stack-allocated
        return newvector;           // return a copy??
}

返回值是向量，而不是向量或指向向量的指针，因此您将获得当前内容的副本作为返回值，并且插入到副本中将不会更改集合中的任何向量。也许修复它，并确保没有向量的异步引用，然后查找剩余的错误。

TBB并发无序映射导致segfault

3 个答案: