Question

我写了一个简单的代码如下，检查GPU是否可以做一些计算工作。

 id<MTLDevice> device = MTLCreateSystemDefaultDevice();
 NSLog(@"Device: %@", [device name]);

 id<MTLCommandQueue> commandQueue = [device newCommandQueue];

 NSError * ns_error = nil;
 id<MTLLibrary>defaultLibrary = [device newLibraryWithFile:@"/Users/i/tmp/tmp6/s.metallib" error:&ns_error];

 // Buffer for storing encoded commands that are sent to GPU
 id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];

 // Encoder for GPU commands
 id <MTLComputeCommandEncoder> computeCommandEncoder = [commandBuffer computeCommandEncoder];

 //set input and output data
 float tmpbuf[1000];
 float outbuf[1000];
 for( int i = 0; i < 1000; i++ )
 {
     tmpbuf[i] = i;
     outbuf[i] = 0;
 }

 int tmp_length = 100*sizeof(float);
 id<MTLBuffer> inVectorBuffer = [device newBufferWithBytes: tmpbuf length: tmp_length options: MTLResourceOptionCPUCacheModeDefault ];
 [computeCommandEncoder setBuffer: inVectorBuffer offset: 0 atIndex: 0 ];
 id<MTLBuffer> outVectorBuffer = [device newBufferWithBytes: outbuf length: tmp_length options: MTLResourceOptionCPUCacheModeDefault ];
 [computeCommandEncoder setBuffer: outVectorBuffer offset: 0 atIndex: 1 ];


 //get fuction
 id<MTLFunction> newfunc = [ defaultLibrary newFunctionWithName:@"sigmoid" ];

 //get pipelinestat
 id<MTLComputePipelineState> cpipeline = [device newComputePipelineStateWithFunction: newfunc error:&ns_error ];

 [computeCommandEncoder setComputePipelineState:cpipeline ];

 //
 MTLSize ts= {10, 10, 1};
 MTLSize numThreadgroups = {2, 5, 1};
 [computeCommandEncoder dispatchThreadgroups:numThreadgroups threadsPerThreadgroup:ts];
 [ computeCommandEncoder endEncoding ];
 [ commandBuffer commit];

 //get data computed by GPU
 NSData* outdata = [NSData dataWithBytesNoCopy:[outVectorBuffer contents ] length: tmp_length freeWhenDone:false ];
 float final_out[1000];
 [outdata getBytes:final_out length:tmp_length];

 //In my option, each value of final_out should be 0
 for( int i = 0; i < 1000; i++ )
 {
     printf("%.2f : %.2f\n", tmpbuf[i], final_out[i]);
 }

着色器文件名称s.shader如下所示，它为值10.0分配输出

using namespace metal;
kernel void sigmoid(const device float *inVector [[ buffer(0) ]],
                device float *outVector [[ buffer(1) ]],
                uint id [[ thread_position_in_grid ]]) {
    // This calculates sigmoid for _one_ position (=id) in a vector per call on the GPU
    outVector[id] = 10.0;
}

在上面的代码中，我通过变量final_out得到了GPU计算的数据。在我的选项中，final_out的每个值应该是10.0，如s.shader中所示。但是，final_out的所有值都是0.从GPU获取数据有什么问题吗？感谢。

Answer 1

提交命令缓冲区只是告诉驱动程序开始执行它。如果要在CPU上读回GPU操作的结果，则需要使用-waitUntilCompleted阻止当前线程，或者在命令缓冲区使用-addCompletedHandler:方法完成时添加要调用的块。

另一个注意事项：看起来您正在使用存储模式为Shared的缓冲区。如果您曾使用存储模式为Managed的缓冲区，则还需要创建一个blit命令编码器并使用适当的缓冲区调用synchronizeResource:，然后等待它完成如上所述，为了从GPU复制结果。

使用金属时无法从gpu获取数据

1 个答案: