我正在尝试第一次涉足GPU编程领域。我以为我会从简单的事情开始,然后使用预制内核(因此称为MPS),然后尝试将命令发布给GPU。
我的尝试是简单地对1到1000之间的所有值求和。我将每个值放入1x1矩阵中,并使用MPS矩阵和。
在我的MacBook Pro上,它可以按我期望的那样工作。
在我的iMac上,结果为[0.0]。我认为这与内存有关,因为我在MacBook Pro上使用了iGPU,在iMac上使用了dGPU,但是据我所知,storageModeShared不会导致这种情况。我什至尝试过将.synchronize()添加到结果矩阵中,然后再尝试从中读取,即使我很确定使用storageModeShared也不需要。
该代码不是很优雅,因为它只是为了快速理解使用MPS发出命令的工作方式,我已经尝试了一段时间来解决问题,但没有跟踪结构,但是它仍然应该很容易阅读;如果没有让我知道,我将对其进行重构。
除了print(output)之外,print语句仅用于尝试和调试
我讨厌粘贴太多代码,但我恐怕无法真正隔离更多问题。
import Cocoa
import Quartz
import PlaygroundSupport
import MetalPerformanceShaders
let device = MTLCopyAllDevices()[0]
print(MTLCopyAllDevices())
let shaderKernel = MPSMatrixSum.init(device: device, count: 1000, rows: 1, columns: 1, transpose: false)
var matrixList: [MPSMatrix] = []
var GPUStorageBuffers: [MTLBuffer] = []
for i in 1...1000 {
var a = Float32(i)
var b: [Float32] = []
let descriptor = MPSMatrixDescriptor.init(rows: 1, columns: 1, rowBytes: 4, dataType: .float32)
b.append(a)
let buffer = device.makeBuffer(bytes: b, length: 4, options: .storageModeShared)
GPUStorageBuffers.append(buffer!)
let GPUStoredMatrices = MPSMatrix.init(buffer: buffer!, descriptor: descriptor)
matrixList.append(GPUStoredMatrices)
}
let matrices: [MPSMatrix] = matrixList
print(matrices.count)
print("\n")
print(matrices[4].debugDescription)
print("\n")
var printer: [Float32] = []
let pointer2 = matrices[4].data.contents()
let typedPointer2 = pointer2.bindMemory(to: Float32.self, capacity: 1)
let buffpoint2 = UnsafeBufferPointer(start: typedPointer2, count: 1)
buffpoint2.map({value in
printer += [value]
})
print(printer)
let CMDQue = device.makeCommandQueue()
let CMDBuffer = CMDQue!.makeCommandBuffer()
var resultMatrix = MPSMatrix.init(device: device, descriptor: MPSMatrixDescriptor.init(rows: 1, columns: 1, rowBytes: 4, dataType: .float32))
shaderKernel.encode(to: CMDBuffer!, sourceMatrices: matrices, resultMatrix: resultMatrix, scale: nil, offsetVector: nil, biasVector: nil, start: 0)
print(CMDBuffer.debugDescription)
CMDBuffer!.commit()
print(CMDBuffer.debugDescription)
print(CMDQue.debugDescription)
let GPUStartTime = CACurrentMediaTime()
CMDBuffer!.waitUntilCompleted()
var output = [Float32]()
resultMatrix.synchronize(on: CMDBuffer!)
let pointer = resultMatrix.data.contents()
let typedPointer = pointer.bindMemory(to: Float32.self, capacity: 1)
let buffpoint = UnsafeBufferPointer(start: typedPointer, count: 1)
buffpoint.map({value in
output += [value]
})
print(output)
let finish = GPUStartTime - CACurrentMediaTime()
print("\n")
print(finish)