Question

我正在使用前缀扫描算法来进行积分图像，但应该进行转换不会改变。当我调用内核函数时，我认为我做错了。当我调用内核函数时，我的dimblock是16，我的dimgrid是

dim3 dimGrid((int)ceil(height / dimBlock.x), (int)ceil(width / dimBlock.y))

Answer 1

看起来你的网格和块大小计算中有一个integeral /浮点类型混合。

你试图进行舍入除法，但是既然你首先划分整数对，你会得到整数结果，舍入 down 和{{1调用什么都不做：

ceil()

...如果你在一个表达式中同时执行所有这些操作，就会发生同样的事情：你的dimGrid的第一个元素是63而不是你想要的64个。

相反，请使用以下内容：

height = 1023;
dimBlock.x = 16;

auto x = height / dimBlock.x;  // x is of type int and x == 63
auto y = ceil(x);              // y is of type double, but y == 63.0
auto z = (int) y;              // z is of type int and x == 63

（不是一个完美的实现;我避免使用模板）

现在，你会写：

template <typename T>
unsigned int div_rounding_up(const T& dividend, const T& divisor) 
{
    return (dividend + divisor - 1) / divisor;
}

Cuda前缀扫描

1 个答案: