Question

我正在尝试从相机和相关的运动数据中捕捉帧。为了同步我正在使用时间戳。视频和动作被写入文件然后进行处理。在那个过程中，我可以为每个视频计算运动帧偏移量。

将相同时间戳的运动数据和视频数据相互偏移0.2秒至0.3秒的不同时间。该偏移对于一个视频是恒定的，但是从视频到视频是不同的。如果它是相同的偏移量，每次我能够减去一些校准值，但事实并非如此。

是否有一种同步时间戳的好方法？也许我没有正确录制它们？是否有更好的方法将它们引入相同的参考框架？

CoreMotion返回相对于系统正常运行时间的时间戳，因此我添加了偏移以获得unix时间：

uptimeOffset = [[NSDate date] timeIntervalSince1970] - 
                   [NSProcessInfo processInfo].systemUptime;

CMDeviceMotionHandler blk =
    ^(CMDeviceMotion * _Nullable motion, NSError * _Nullable error){
        if(!error){
            motionTimestamp = motion.timestamp + uptimeOffset;
            ...
        }
    };

[motionManager startDeviceMotionUpdatesUsingReferenceFrame:CMAttitudeReferenceFrameXTrueNorthZVertical
                                                   toQueue:[NSOperationQueue currentQueue]
                                               withHandler:blk];

为了获得高精度的帧时间戳，我正在使用AVCaptureVideoDataOutputSampleBufferDelegate。它也被unix时间抵消：

-(void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
       fromConnection:(AVCaptureConnection *)connection
{
    CMTime frameTime = CMSampleBufferGetOutputPresentationTimeStamp(sampleBuffer);

    if(firstFrame)
    {
        firstFrameTime = CMTimeMake(frameTime.value, frameTime.timescale);
        startOfRecording = [[NSDate date] timeIntervalSince1970];
    }

    CMTime presentationTime = CMTimeSubtract(frameTime, firstFrameTime);
    float seconds = CMTimeGetSeconds(presentationTime);

    frameTimestamp = seconds + startOfRecording;
    ...
}

Answer 1

关联这些时间戳实际上非常简单 - 尽管没有明确记录，但相机帧和运动数据时间戳均基于 mach_absolute_time() 时基。

这是一个单调定时器，在启动时重置，但重要的是在设备休眠时也会停止计数。因此，没有简单的方法可以将其转换为标准的“挂钟”时间。

幸运的是，您不需要这样做，因为时间戳是直接可比的——motion.timestamp 以秒为单位，您可以在回调中注销 mach_absolute_time() 以查看它是相同的时基。我的快速测试显示，处理程序中的运动时间戳通常在 mach_absolute_time 之前大约 2 毫秒，这对于将数据报告给应用程序可能需要多长时间来说似乎是正确的。

注意 mach_absolute_time() 是刻度单位，需要转换为纳秒；在 iOS 10 及更高版本上，您可以使用等效的 clock_gettime_nsec_np(CLOCK_UPTIME_RAW); which does the same thing。

    [_motionManager
     startDeviceMotionUpdatesUsingReferenceFrame:CMAttitudeReferenceFrameXArbitraryZVertical
     toQueue:[NSOperationQueue currentQueue]
     withHandler:^(CMDeviceMotion * _Nullable motion, NSError * _Nullable error) {
        // motion.timestamp is in seconds; convert to nanoseconds
        uint64_t motionTimestampNs = (uint64_t)(motion.timestamp * 1e9);
        
        // Get conversion factors from ticks to nanoseconds
        struct mach_timebase_info timebase;
        mach_timebase_info(&timebase);
        
        // mach_absolute_time in nanoseconds
        uint64_t ticks = mach_absolute_time();
        uint64_t machTimeNs = (ticks * timebase.numer) / timebase.denom;
        
        int64_t difference = machTimeNs - motionTimestampNs;
        
        NSLog(@"Motion timestamp: %llu, machTime: %llu, difference %lli", motionTimestampNs, machTimeNs, difference);
    }];

对于相机来说，时基也是一样的：

// In practice gives the same value as the CMSampleBufferGetOutputPresentationTimeStamp
// but this is the media's "source" timestamp which feels more correct
CMTime frameTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
uint64_t frameTimestampNs = (uint64_t)(CMTimeGetSeconds(frameTime) * 1e9);

时间戳和被调用的处理程序之间的延迟在这里有点大，通常在 10 毫秒内。

我们现在需要考虑相机帧上的时间戳实际上意味着什么 - 这里有两个问题；有限的曝光时间和滚动快门。

滚动快门意味着并非图像的所有扫描线实际上都是同时捕获的 - 首先捕获顶行，最后捕获底行。数据的滚动读数分布在整个帧时间内，因此在 30 FPS 相机模式下，最终扫描线的曝光开始/结束时间几乎正好是第一条扫描线的相应开始/结束时间之后的 1/30 秒。

我的测试表明 AVFoundation 帧中的演示时间戳是帧读出的开始——即第一条扫描线曝光的结束。所以最后一条扫描线的曝光结束是在这之后的 frameDuration 秒，而第一条扫描线的曝光开始是在这之前 exposureTime 秒。因此，帧曝光中心（图像中间扫描线曝光的中点）的时间戳可以计算为：

const double frameDuration = 1.0/30; // rolling shutter effect, depends on camera mode
const double exposure = avCaptureDevice.exposureDuration;
CMTime frameTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
double midFrameTime = CMTimeGetSeconds(frameTime) - exposure * 0.5 + frameDuration * 0.5;

在室内环境中，无论如何，曝光通常会以全帧时间结束，因此上方的 midFrameTime 最终与 frameTime 相同。这种差异很明显（在极快的动作下），短曝光通常是从明亮的户外场景中获得的。

为什么原始方法有不同的偏移

我认为您偏移的主要原因是您假设第一帧的时间戳是处理程序运行的时间 - 即它不考虑捕获数据和将数据传送到您的应用程序之间的任何延迟。特别是如果您将主队列用于这些处理程序，我可以想象第一帧的回调会延迟您提到的 0.2-0.3 秒。

Answer 2

我能找到这个问题的最佳解决方案是要在录制的视频上运行一个特征跟踪器，选择一个强大的特征并绘制它沿着X轴移动的速度，然后将该图与加速度计Y数据相关联。

当有2个相似的图沿横坐标相互偏移时，有一种叫cross-correlation的技术可以找到偏移量。

这种方法有一个明显的缺点 - 它需要一些视频处理，因此速度很慢。

iOS：同步相机和动态数据中的帧

2 个答案:

为什么原始方法有不同的偏移