我正在尝试使用Vision和CoreML尽可能实时地对跟踪对象执行样式转换。我正在使用AVKit捕获视频,并使用AVCaptureVideoDataOutputSampleBufferDelegate来获取每一帧。
总的来说,我的管道是:
1)检测人脸
2)更新预览层,以在适当的屏幕位置绘制边框
3)将原始图像裁剪到检测到的脸部
4)通过coreML模型运行人脸图像,并获取新图像作为输出
5)用新图像填充预览层(无论它们在哪里)
我希望在计算出边界框后立即将它们放在主线程上,然后在推理完成后将其填充。但是,我发现将coreML推理添加到管道中(在AVCaptureOutputQueue或CoreMLQueue上),在推理完成之前,边界框不会更新位置。也许我在关闭中如何处理队列方面缺少一些东西。代码的(希望)相关部分如下。
我正在修改https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time中的代码。
public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
// omitting stuff that gets pixelBuffers etc formatted for use with Vision
// and sets up tracking requests
// Perform landmark detection on tracked faces
for trackingRequest in newTrackingRequests {
let faceLandmarksRequest = VNDetectFaceLandmarksRequest(completionHandler: { (request, error) in
guard let landmarksRequest = request as? VNDetectFaceLandmarksRequest,
let results = landmarksRequest.results as? [VNFaceObservation] else {
return
}
// Perform all UI updates (drawing) on the main queue,
//not the background queue on which this handler is being called.
DispatchQueue.main.async {
self.drawFaceObservations(results) //<<- places bounding box on the preview layer
}
CoreMLQueue.async{ //Queue for coreML uses
//get region of picture to crop for CoreML
let boundingBox = results[0].boundingBox
//crop the input frame to the detected object
let image: CVPixelBuffer = self.cropFrame(pixelBuffer: pixelBuffer, region: boundingBox)
//infer on region
let styleImage: CGImage = self.performCoreMLInference(on: image)
//on the main thread, place styleImage into the bounding box(CAShapeLayer)
DispatchQueue.main.async{
self.boundingBoxOverlayLayer?.contents = styleImage
}
}
})
do {
try requestHandler.perform(faceLandmarksRequest)
} catch let error as NSError {
NSLog("Failed Request: %@", error)
}
}
}
除了队列/同步问题之外,我还认为导致速度下降的一个原因可能是将像素缓冲区裁剪到感兴趣的区域。我在这里没有想法,任何帮助将不胜感激
答案 0 :(得分:0)
我正在使用https://github.com/maxvol/RxAVFoundation和https://github.com/maxvol/RxVision的管道来解决同步问题。
一个基本示例-
let textRequest: RxVNDetectTextRectanglesRequest<CVPixelBuffer> = VNDetectTextRectanglesRequest.rx.request(reportCharacterBoxes: true)
var session = AVCaptureSession.rx.session()
var requests = [RxVNRequest<CVPixelBuffer>]()
self.requests = [self.textRequest]
self
.textRequest
.observable
.observeOn(Scheduler.main)
.subscribe { [unowned self] (event) in
switch event {
case .next(let completion):
self.detectTextHandler(value: completion.value, request: completion.request, error: completion.error)
default:
break
}
}
.disposed(by: disposeBag)
self.session
.flatMapLatest { [unowned self] (session) -> Observable<CaptureOutput> in
let imageLayer = session.previewLayer
imageLayer.frame = self.imageView.bounds
self.imageView.layer.addSublayer(imageLayer)
return session.captureOutput
}
.subscribe { [unowned self] (event) in
switch event {
case .next(let captureOutput):
guard let pixelBuffer = CMSampleBufferGetImageBuffer(captureOutput.sampleBuffer) else {
return
}
var requestOptions: [VNImageOption: Any] = [:]
if let camData = CMGetAttachment(captureOutput.sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) {
requestOptions = [.cameraIntrinsics: camData]
}
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up, options: requestOptions)
do {
try imageRequestHandler.rx.perform(self.requests, with: pixelBuffer)
} catch {
os_log("error: %@", "\(error)")
}
break
case .error(let error):
os_log("error: %@", "\(error)")
break
case .completed:
// never happens
break
}
}
.disposed(by: disposeBag)