Question

我正在开发使用Swift和Metal进行GPU上图像处理的macOS项目。上周，我收到了我的新款15英寸MacBook Pro（2016年末）并注意到我的代码有些奇怪：应该写入纹理的内核似乎没有这样做......

经过大量挖掘后，我发现问题与Metal（AMD Radeon Pro 455或Intel（R）HD Graphics 530）使用哪个GPU进行计算有关。

使用MTLDevice初始化MTLCopyAllDevices()会返回代表Radeon和Intel GPU的设备阵列（而MTLCreateSystemDefaultDevice()会返回默认设备，即Radeon）。在任何情况下，代码都可以像英特尔GPU一样工作，但Radeon GPU并非如此。

让我举个例子。

首先，这是一个简单的内核，它接受输入纹理并将其颜色复制到输出纹理：

    kernel void passthrough(texture2d<uint, access::read> inTexture [[texture(0)]],
                            texture2d<uint, access::write> outTexture [[texture(1)]],
                            uint2 gid [[thread_position_in_grid]])
    {
        uint4 out = inTexture.read(gid);
        outTexture.write(out, gid);
    }

我命令使用这个内核，我使用这段代码：

    let devices = MTLCopyAllDevices()
    for device in devices {
        print(device.name!) // [0] -> "AMD Radeon Pro 455", [1] -> "Intel(R) HD Graphics 530"
    }

    let device = devices[0] 
    let library = device.newDefaultLibrary()
    let commandQueue = device.makeCommandQueue()

    let passthroughKernelFunction = library!.makeFunction(name: "passthrough")

    let cps = try! device.makeComputePipelineState(function: passthroughKernelFunction!)

    let commandBuffer = commandQueue.makeCommandBuffer()
    let commandEncoder = commandBuffer.makeComputeCommandEncoder()

    commandEncoder.setComputePipelineState(cps)

    // Texture setup
    let width = 16
    let height = 16
    let byteCount = height*width*4
    let bytesPerRow = width*4
    let region = MTLRegionMake2D(0, 0, width, height)
    let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Uint, width: width, height: height, mipmapped: false)

    // inTexture
    var inData = [UInt8](repeating: 255, count: Int(byteCount))
    let inTexture = device.makeTexture(descriptor: textureDescriptor)
    inTexture.replace(region: region, mipmapLevel: 0, withBytes: &inData, bytesPerRow: bytesPerRow)

    // outTexture
    var outData = [UInt8](repeating: 128, count: Int(byteCount))
    let outTexture = device.makeTexture(descriptor: textureDescriptor)
    outTexture.replace(region: region, mipmapLevel: 0, withBytes: &outData, bytesPerRow: bytesPerRow)

    commandEncoder.setTexture(inTexture, at: 0)
    commandEncoder.setTexture(outTexture, at: 1)
    commandEncoder.dispatchThreadgroups(MTLSize(width: 1,height: 1,depth: 1), threadsPerThreadgroup: MTLSize(width: width, height: height, depth: 1))

    commandEncoder.endEncoding()
    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()

    // Get the data back from the GPU
    outTexture.getBytes(&outData, bytesPerRow: bytesPerRow, from: region , mipmapLevel: 0)

    // Validation
    // outData should be exactly the same as inData 
    for (i,outElement) in outData.enumerated() {
        if outElement != inData[i] {
            print("Dest: \(outElement) != Src: \(inData[i]) at \(i))")
        }
    }

使用let device = devices[0]（Radeon GPU）运行此代码时，outTexture永远不会写入（我的假设），因此outData保持不变。另一方面，当使用let device = devices[1]（英特尔GPU）运行此代码时，一切都按预期工作，并使用inData中的值更新outData。

Answer 1

我认为只要GPU写入MTLStorageModeManaged资源（如纹理），然后您想要从CPU读取该资源（例如使用getBytes()），您需要使用blit编码器。尝试将以下内容放在commandBuffer.commit()行上方：

let blitEncoder = commandBuffer.makeBlitCommandEncoder()
blitEncoder.synchronize(outTexture)
blitEncoder.endEncoding()

如果没有这个，你可能会在集成GPU上逃脱，因为GPU正在使用系统内存作为资源而且没有什么可以同步。

金属内核在新MacBook Pro（2016年末）GPU

1 个答案: