金属计算功能限制

时间:2015-04-11 12:55:11

标签: gpgpu metal

我经历过具有计算密集型着色器功能的MTLBuffer在所有threadgroups完成之前往往会停止计算。当我使用MTLComputePipelineStateMTLComputeCommandEncoder模糊具有非常大的模糊半径的图像时,生成的图像会被半途处理,并且实际上可以看到一半已完成threadgroups。我没有将其缩小到模糊半径的确切数量,但是16个像素工作正常,32个已经太多而且甚至没有计算一半的组。

那么着色器函数调用应该花多长时间完成或类似的任何限制?我刚刚完成了关于如何使用Metal框架的大部分文档,我记不起任何此类陈述的绊脚石。

修改

因为在我的情况下,问题不是简单的超时,而是一些内部错误,我将添加一些代码。

最昂贵的部分是块匹配算法,它在两个图像中找到匹配的块(即电影中的连续帧)

//Exhaustive Search Block-matching algorithm
kernel void naiveMotion(
    texture2d<float,access::read>   inputImage1   [[ texture(0) ]],
    texture2d<float,access::read>   inputImage2   [[ texture(1) ]],
    texture2d<float,access::write>  outputImage  [[ texture(2) ]],
uint2 gid                                    [[ thread_position_in_grid ]]
)
{
    //area to search for matches
    float searchSize = 10.0;
    int searchRadius = searchSize/2;

    //window size to search in
    int kernelSize = 6;
    int kernelRadius = kernelSize/2;

    //this will store the motion direction
    float2 vector = float2(0.0,0.0);
    float2 maxVector = float2(searchSize,searchSize/2);
    float maxVectorLength = length(maxVector);

    //maximum error caused by noise
    float error = kernelSize*kernelSize*(10.0/255.0);


    for (int y = -searchRadius; y < searchRadius; ++y)
    {
        for (int x = 0; x < searchSize; ++x)
        {
            float diff = 0;

            for (int b = - kernelRadius; b < kernelRadius; ++b)
            {
                for (int a = - kernelRadius; a < kernelRadius; ++a)
                {
                    uint2 textureIndex(gid.x + x + a, gid.y + y + b);
                    float4 targetColor = inputImage2.read(textureIndex).rgba;
                    float4 referenceColor = inputImage1.read(gid).rgba;
                    float targetGray = 0.299*targetColor.r + 0.587*targetColor.g + 0.114*targetColor.b;
                    float referenceGray = 0.299*referenceColor.r + 0.587*referenceColor.g + 0.114*referenceColor.b;
                    diff = diff + abs(targetGray - referenceGray);
                }
            }

            if ( error > diff )
            {
                error = diff;
                //vertical motion is rather irrelevant but negative values can't be stored so just take the absolute value
                vector = float2(x, abs(y));
            }
        }
    }

    float intensity = length(vector)/maxVectorLength;
    outputImage.write(float4(normalize(vector), intensity, 1),gid);
}

我在960x540px图像上使用该着色器。如果searchSize为9且kernelSize为8,则着色器会在整个图像上运行。将searchSize更改为10,着色器将提前停止,错误代码为1。

0 个答案:

没有答案