Question

我有一种算法，在该算法中，我反复执行reduce_by_key并获得减少值的向量，每个键控段一个。我跟踪连续迭代中每个段的最佳输出值。看起来像这样：

thrust::device_vector<int> keys(N), values(N), outKeys(N), outValues(N), outValuesBest(N);

// Set up the keys
// Initialize outValuesBest

while (1) {
    // Get some values (keys stay the same across iterations)

    // Do the reduce
    auto outEnd = thrust::reduce_by_key(thrust::device,
        keys.begin(), keys.end(), values.begin(),
        outKeys.begin(), outValues.begin(),
        [] __device__(int ka, int kb) { return ka == kb; },
        [] __device__(int a, int b) { return min(a, b); });

    size_t nSegments = outEnd.first - outKeys.begin();

    auto outValuesp = outValues.begin();
    auto outValuesBestp = outValuesBest.begin();

    // Update the per-segment vector of best results
    thrust::for_each_n(thrust::device,
        thrust::counting_iterator<size_t>(0), nSegments, [=] __device__(size_t i) {
        if (outValuesp[i] < outValuesBestp[i]) {
            outValuesBestp[i] = outValuesp[i];
        }
    });
}

如您所见，我当前的实现使用一种单独的算法来比较精简输出的每个元素，并为每个改进的段更新最佳矢量中的相应元素。我想摆脱阵列的第二个操作和第二个副本（及其关联的带宽）。这是使用Thrust中的花式迭代器进行内核融合的工作。有谁知道融合这两个内核的方法？

我正在设想一种新型的花式迭代器来处理此问题。称它为conditional_discard_iterator。我可以这样说。想法是，只有在谓词返回true时，才将输出元素实际写入底层迭代器，而在false时将其丢弃。

thrust::device_vector<int> keys(N), values(N), outKeys(N), outValuesBest(N);

// Set up the keys...
// Initialize outValuesBest...

while (1) {
    // Get some values...

    auto OutIt = make_conditional_discard_iterator(outValuesBest.begin(),
      [] __device__(int newValue, int fromOutValuesBest)
        { return newValue < fromOutValuesBest; });

    auto outEnd = thrust::reduce_by_key(thrust::device,
        keys.begin(), keys.end(), values.begin(),
        outKeys.begin(), OutIt,
        [] __device__(int ka, int kb) { return ka == kb; },
        [] __device__(int a, int b) { return min(a, b); });

    size_t nSegments = outEnd.first - outKeys.begin();
}

谓词的输入可能会更通用，但是否则这似乎是一个非常优雅的解决方案。我只是希望它已经存在。如果不在Thrust中，那么Boost或STL中是否存在？

对于某些人来说，这个问题不是另一个问题的重复是不那么明显的，仅仅是因为它涉及到奇特的输出迭代器。

如何进行有条件的丢弃以融合这两个“推力”操作？

0 个答案: