我正在尝试使用CoreML对直播视频进行背景分割。我使用了Apple提供的DeepLabV3。该模型可以正常工作,即使它已经需要100毫秒来处理513x513图像。然后,我想显示输出,它是int32的513x513数组。将其转换为图像as done in CoreMLHelpers
需要300毫秒,我正在寻找一种显示结果更快的方法。我在想,也许以某种方式将其转储到OpenGL或Metal纹理会更快。
处理实时输入的MLMultiArray
的最佳方法是什么?
答案 0 :(得分:0)
您可以更改模型以使其输出MLMultiArray
类型的图像,而不是输出CVPixelBuffer
。然后,您可以使用CVMetalTextureCacheCreateTextureFromImage
将像素缓冲区变成MTLTexture
。 (我认为可以,但是我不记得是否尝试过。不是所有的像素缓冲区对象都可以变成纹理,并且我不确定Core ML是否输出CVPixelBuffer
打开了“金属兼容性标志”的对象。)
或者,您可以编写一个接受MLMultiArray
并将其转换为纹理的计算内核,然后将其绘制到Metal视图中。这样做的好处是,您可以将各种效果同时应用于计算内核中的分段图。
答案 1 :(得分:0)
我的答案是基于处理Metal中的MLMultiArray
创建一个MTLBuffer:
let device = MTLCreateSystemDefaultDevice()!
let segmentationMaskBuffer: MTLBuffer = self.device.makeBuffer(length: segmentationHeight * segmentationWidth * MemoryLayout<Int32>.stride)
将MLMultiArray复制到MTLBuffer:
memcpy(segmentationMaskBuffer.contents(), mlOutput.semanticPredictions.dataPointer, segmentationMaskBuffer.length)
设置金属相关变量:
let commandQueue = device.makeCommandQueue()!
let library = device.makeDefaultLibrary()!
let function = library.makeFunction(name: "binaryMask")!
let computePipeline = try! device.makeComputePipelineState(function: function)
创建用于细分大小的结构:
let segmentationWidth = 513
let segmentationHeight = 513
struct MixParams {
var width: Int32 = Int32(segmentationWidth)
var height: Int32 = Int32(segmentationHeight)
}
创建输出纹理:
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .bgra8Unorm, width: width, height: height, mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
let outputTexture = device.makeTexture(descriptor: textureDescriptor)!
将mtlbuffer的输出纹理传递给内核函数:
let buffer = commandQueue.makeCommandBuffer()!
let maskCommandEncoder = buffer.makeComputeCommandEncoder()!
maskCommandEncoder.setTexture(outputTexture, index: 1)
maskCommandEncoder.setBuffer(segmentationBuffer, offset: 0, index: 0)
maskCommandEncoder.setBytes(¶ms, length: MemoryLayout<MixParams>.size, index: 1)
let w = computePipeline.threadExecutionWidth
let h = computePipeline.maxTotalThreadsPerThreadgroup / w
let threadGroupSize = MTLSizeMake(w, h, 1)
let threadGroups = MTLSizeMake(
(depthWidth + threadGroupSize.width - 1) / threadGroupSize.width,
(depthHeight + threadGroupSize.height - 1) / threadGroupSize.height, 1)
maskCommandEncoder.setComputePipelineState(computePipeline)
maskCommandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupSize)
maskCommandEncoder.endEncoding()
在Shaders.metal文件中编写内核函数:
#include <metal_stdlib>
using namespace metal;
#include <CoreImage/CoreImage.h>
struct MixParams {
int segmentationWidth;
int segmentationHeight;
};
static inline int get_class(float2 pos, int width, int height, device int* mask) {
const int x = int(pos.x * width);
const int y = int(pos.y * height);
return mask[y*width + x];
}
static float get_person_probability(float2 pos, int width, int height, device int* mask) {
return get_class(pos, width, height, mask) == 15;
}
kernel void binaryMask(
texture2d<float, access::write> outputTexture [[texture(1)]],
device int* segmentationMask [[buffer(0)]],
constant MixParams& params [[buffer(1)]],
uint2 gid [[thread_position_in_grid]])
{
float width = outputTexture.get_width();
float height = outputTexture.get_height();
if (gid.x >= width ||
gid.y >= height) return;
const float2 pos = float2(float(gid.x) / width,
float(gid.y) / height);
const float is_person = get_person_probability(pos, params.segmentationWidth,
params.segmentationHeight,
segmentationMask);
float4 outPixel;
if (is_person < 0.5f) {
outPixel = float4(0.0,0.0,0.0,0.0);
} else {
outPixel = float4(1.0,1.0,1.0,1.0);
}
outputTexture.write(outPixel, gid);
}
最后在编码完成后从输出纹理中获取ciimage:
let kciOptions: [CIImageOption: Any] = [CIImageOption.colorSpace: CGColorSpaceCreateDeviceRGB()]
let maskIMage = CIImage(mtlTexture: outputTexture,options: kciOptions)!.oriented(.downMirrored)