Question

我找到了一种方法将长时间运行的算法分解为并行块。我不太明白我在OpenCL中实现这一点的方法是否可行。

对我而言，其中一个障碍是，对于我正在计算的“结果”，我不知道每次执行任务会有多少期待。

所以我的计划是创建一个具有足够空间的缓冲区，例如10个结果，另一个缓冲区只存储一个指示缓冲区是否已填满的值。

我遇到的另一个问题是我可能会启动许多任务，而且我不想将输入预先计算到一个大的长缓冲区中，因为这可能是相当多的数据，我只是想在开始每项任务之前计算输入。

例如，这是尝试方法的一些伪代码：

* Create a vector to store all results
* Create the "results" buffer.
* Create the "is-filled" buffer to store whether results buffer was filled.

while (some condition) {
    // Before task.
    * Create the "input" buffer with data (input data comes from an expensive function).
    * Update kernel arguments.

    // Run task.
    queue.enqueueTask(kernel);

    // After task.
    * Read the "is-filled" buffer to determine whether "results" buffer is full.
    if ("results" buffer is full) {
       * Read the "results" buffer into the vector.
       * Read the "input" buffer (now changed to indicate next inputs to 'resume' task)
       * Reset "results" & "is-filled" buffers
    }

}

Read remaining "results" buffer into the vector.

这感觉很尴尬，如果还有另一种处理缓冲填充的方法，那么我想知道。

最令人担忧的是“After task”部分阻止了执行并阻止了并行性的发生。

所以我的问题是并行的障碍;可变数量的结果，并在每个任务之前更改参数。我尝试的工作流程可能更成问题:)

你会怎么做？

我找到了这个帖子，提问者谈到在工人等待时定期清理缓冲区。但是我找不到关于这种技术的任何细节。 OpenCL read variable size result buffer from the GPU

GPU并行性能否定期清除缓冲区？

0 个答案: