Question

我正在尝试使用CoreML对直播视频进行背景分割。我使用了Apple提供的DeepLabV3。该模型可以正常工作，即使它已经需要100毫秒来处理513x513图像。然后，我想显示输出，它是int32的513x513数组。将其转换为图像as done in CoreMLHelpers需要300毫秒，我正在寻找一种显示结果更快的方法。我在想，也许以某种方式将其转储到OpenGL或Metal纹理会更快。

处理实时输入的MLMultiArray的最佳方法是什么？

Answer 1

您可以更改模型以使其输出MLMultiArray类型的图像，而不是输出CVPixelBuffer。然后，您可以使用CVMetalTextureCacheCreateTextureFromImage将像素缓冲区变成MTLTexture。（我认为可以，但是我不记得是否尝试过。不是所有的像素缓冲区对象都可以变成纹理，并且我不确定Core ML是否输出CVPixelBuffer打开了“金属兼容性标志”的对象。）

或者，您可以编写一个接受MLMultiArray并将其转换为纹理的计算内核，然后将其绘制到Metal视图中。这样做的好处是，您可以将各种效果同时应用于计算内核中的分段图。

Answer 2

我的答案是基于处理Metal中的MLMultiArray

创建一个MTLBuffer：

let device = MTLCreateSystemDefaultDevice()!
let segmentationMaskBuffer: MTLBuffer = self.device.makeBuffer(length: segmentationHeight * segmentationWidth * MemoryLayout<Int32>.stride)

将MLMultiArray复制到MTLBuffer：

memcpy(segmentationMaskBuffer.contents(), mlOutput.semanticPredictions.dataPointer, segmentationMaskBuffer.length)

设置金属相关变量：

let commandQueue = device.makeCommandQueue()!
let library = device.makeDefaultLibrary()!
let function = library.makeFunction(name: "binaryMask")!
let computePipeline = try! device.makeComputePipelineState(function: function)

创建用于细分大小的结构：

let segmentationWidth = 513
let segmentationHeight = 513

struct MixParams {
    var width: Int32 = Int32(segmentationWidth)
    var height: Int32 = Int32(segmentationHeight)
}

创建输出纹理：


let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .bgra8Unorm, width: width, height: height, mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
let outputTexture = device.makeTexture(descriptor: textureDescriptor)!

将mtlbuffer的输出纹理传递给内核函数：

let buffer = commandQueue.makeCommandBuffer()!
let maskCommandEncoder = buffer.makeComputeCommandEncoder()!
maskCommandEncoder.setTexture(outputTexture, index: 1)
maskCommandEncoder.setBuffer(segmentationBuffer, offset: 0, index: 0)
maskCommandEncoder.setBytes(&params, length: MemoryLayout<MixParams>.size, index: 1)
let w = computePipeline.threadExecutionWidth
let h = computePipeline.maxTotalThreadsPerThreadgroup / w
let threadGroupSize = MTLSizeMake(w, h, 1)
let threadGroups = MTLSizeMake(
          (depthWidth  + threadGroupSize.width  - 1) / threadGroupSize.width,
          (depthHeight + threadGroupSize.height - 1) / threadGroupSize.height, 1)
maskCommandEncoder.setComputePipelineState(computePipeline)
maskCommandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupSize)
maskCommandEncoder.endEncoding()

在Shaders.metal文件中编写内核函数：

#include <metal_stdlib>
using namespace metal;
#include <CoreImage/CoreImage.h>

struct MixParams {
    int segmentationWidth;
    int segmentationHeight;
};

static inline int get_class(float2 pos, int width, int height, device int* mask) {
    const int x = int(pos.x * width);
    const int y = int(pos.y * height);
    return mask[y*width + x];
}

static float get_person_probability(float2 pos, int width, int height, device int* mask) {   
    return get_class(pos, width, height, mask) == 15;
}

kernel void binaryMask(
                      texture2d<float, access::write> outputTexture [[texture(1)]],
                      device int* segmentationMask [[buffer(0)]],
                      constant MixParams& params [[buffer(1)]],
                      uint2 gid [[thread_position_in_grid]])
{
    float width = outputTexture.get_width();
    float height = outputTexture.get_height();
    
    if (gid.x >= width ||
        gid.y >= height) return;
    
    
    const float2 pos = float2(float(gid.x) / width,
                              float(gid.y) / height);
    
    const float is_person = get_person_probability(pos, params.segmentationWidth,
                                                   params.segmentationHeight,
                                                   segmentationMask);
    
    float4 outPixel;
    
    if (is_person < 0.5f) {
        outPixel = float4(0.0,0.0,0.0,0.0);
    } else {
        outPixel = float4(1.0,1.0,1.0,1.0);
    }
    
    outputTexture.write(outPixel, gid);
}

最后在编码完成后从输出纹理中获取ciimage：

let kciOptions: [CIImageOption: Any] = [CIImageOption.colorSpace: CGColorSpaceCreateDeviceRGB()]
let maskIMage = CIImage(mtlTexture: outputTexture,options: kciOptions)!.oriented(.downMirrored)

将MLMultiArray转换为图像或OpenGL /金属纹理

2 个答案: