我经历过具有计算密集型着色器功能的MTLBuffer
在所有threadgroups
完成之前往往会停止计算。当我使用MTLComputePipelineState
和MTLComputeCommandEncoder
模糊具有非常大的模糊半径的图像时,生成的图像会被半途处理,并且实际上可以看到一半已完成threadgroups
。我没有将其缩小到模糊半径的确切数量,但是16个像素工作正常,32个已经太多而且甚至没有计算一半的组。
那么着色器函数调用应该花多长时间完成或类似的任何限制?我刚刚完成了关于如何使用Metal框架的大部分文档,我记不起任何此类陈述的绊脚石。
因为在我的情况下,问题不是简单的超时,而是一些内部错误,我将添加一些代码。
最昂贵的部分是块匹配算法,它在两个图像中找到匹配的块(即电影中的连续帧)
//Exhaustive Search Block-matching algorithm
kernel void naiveMotion(
texture2d<float,access::read> inputImage1 [[ texture(0) ]],
texture2d<float,access::read> inputImage2 [[ texture(1) ]],
texture2d<float,access::write> outputImage [[ texture(2) ]],
uint2 gid [[ thread_position_in_grid ]]
)
{
//area to search for matches
float searchSize = 10.0;
int searchRadius = searchSize/2;
//window size to search in
int kernelSize = 6;
int kernelRadius = kernelSize/2;
//this will store the motion direction
float2 vector = float2(0.0,0.0);
float2 maxVector = float2(searchSize,searchSize/2);
float maxVectorLength = length(maxVector);
//maximum error caused by noise
float error = kernelSize*kernelSize*(10.0/255.0);
for (int y = -searchRadius; y < searchRadius; ++y)
{
for (int x = 0; x < searchSize; ++x)
{
float diff = 0;
for (int b = - kernelRadius; b < kernelRadius; ++b)
{
for (int a = - kernelRadius; a < kernelRadius; ++a)
{
uint2 textureIndex(gid.x + x + a, gid.y + y + b);
float4 targetColor = inputImage2.read(textureIndex).rgba;
float4 referenceColor = inputImage1.read(gid).rgba;
float targetGray = 0.299*targetColor.r + 0.587*targetColor.g + 0.114*targetColor.b;
float referenceGray = 0.299*referenceColor.r + 0.587*referenceColor.g + 0.114*referenceColor.b;
diff = diff + abs(targetGray - referenceGray);
}
}
if ( error > diff )
{
error = diff;
//vertical motion is rather irrelevant but negative values can't be stored so just take the absolute value
vector = float2(x, abs(y));
}
}
}
float intensity = length(vector)/maxVectorLength;
outputImage.write(float4(normalize(vector), intensity, 1),gid);
}
我在960x540px图像上使用该着色器。如果searchSize
为9且kernelSize
为8,则着色器会在整个图像上运行。将searchSize更改为10,着色器将提前停止,错误代码为1。