Question

我想在大型数据集上运行tbb :: parallel_for并生成一个唯一的集合。 parallel_for主体中包含一些额外的逻辑，用于确定原始数据集的每个子元素是否应包含在此集合中。结果集通常比原始数据集小得多，我宁愿不计算带有重复项的向量并删除重复项，因为这会增加内存使用量。

我的第一个实现使用tbb :: concurrent_unordered_set，通过分析我注意到set.insert（）方法中存在显着的性能瓶颈。

我尝试改进这一点是尝试使用线程局部存储来生成每个线程的集合，然后在最后组合集合以移除原子。

尽管阅读了很多文档，我仍然不确定tbb :: combinable或tbb :: enumerable_thread_specific是否最适合。

这必须是一个相当常见的用例。有人可以提供一个示例实现或指向我在线的例子，我可以看看？

Answer 1

我认为你正朝着正确的方向前进。并发哈希表对于大量元素（数千个）是有效的。虽然您仍然可以在运行算法之前尝试预留足够的容量并使用concurrent_unordered_set的加载因子（设置为1）并尝试使用concurrent_hash_map（使用{{1}时速度更快没有访问者，它还需要保留一些容量。）

insert(value)和tbb::combinable都使用相同的后端实现。不同之处仅在于界面。 documentation有后者的例子，我稍微重新调整了一下：

tbb::enumerable_thread_specific

最后，尝试另外的方法，使用typedef tbb::enumerable_thread_specific< std::pair<int,int> > CounterType; CounterType MyCounters (std::make_pair(0,0)); int main() { tbb::parallel_for( tbb::blocked_range<int>(0, 100000000), [](const tbb::blocked_range<int> &r) { CounterType::reference my_counter = MyCounters.local(); ++my_counter.first; my_counter.second += r.size(); }); std::pair<int,int> sum = MyCounters.combine( [](std::pair<int,int> x, std::pair<int,int> y) { return std::make_pair(x.first+y.first, x.second+y.second); }); printf("Total calls to operator() = %d, " "total iterations = %d\n", sum.first, sum.second); }，你不需要其他方法，比如可组合，而且减少主要是并行完成的（只有log P顺序步骤，同时结合线程特定的值需要顺序访问所有P元素。）

TBB线程本地集使用combinable或enumerable_thread_specific？

1 个答案: