Question

我很好奇这里是否有人知道原子的效率，特别是std::atomic<int>。我的问题如下：

我有一个数据集，比如传递给算法data = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}的{{1}}。 algo(begin(data), end(data))将数据划分为块并以异步方式执行每个块，因此algo将在4个不同的块上执行它的操作：

algo

在每个单独的分区中，我需要在每个分区

的末尾返回满足谓词

{1, 2, 3}
{4, 5, 6}
{7, 8, 9}
{10, 11, 12}

的元素的数量

op

问题是我要进入数据竞争只是递增1个变量，其中4个块是异步执行的。我在考虑两种可能的解决方案：

使用std :: atomic
- 这里的问题是我对C ++的原子知之甚少，而且据我所知，它们效率低下。这是真的？我希望看到使用原子来跟踪计数的结果是什么？
使用共享数组，其中大小为分区计数
- 我非常了解我的共享数组，所以这个想法似乎并不太糟糕，但我不确定当给出一个非常小的块大小时它会如何保持，这将使共享数组跟踪计数在每个分区的末尾相当大。然而，这将是有用的，因为算法不必等待任何事情完成递增，它只是将它的相应计数放在共享数组中。

所以我的想法，我可以实现它：

//partition lambda function
{
    //'it' corresponds to the position in it's respective partition
    if( op(*it) )
        count++;

    //return the count at the end of this partition
    return count;    
}

//partition lambda function, count is now atomic
{
    //'it' corresponds to the position in it's respective partition
    if( op(*it) )
        count++;

    //return the count at the end of this partition
    return count.load();    
}

关于//partition lambda function, count is in shared array that will be be accessed later //instead of returned { int count = 0; //'it' corresponds to the position in it's respective partition if( op(*it) ) count++; //total count at end of each partition. ignore fact that partition_id = 0 wouldn't work shared_arr[partition_id] = shared_arr[partition_id - 1] + count; } vs atomic的任何想法？

高性能计算：使用shared_array vs atomics？

0 个答案: