正确计算PTS和DTS以同步音频和视频ffmpeg C ++

时间:2015-08-12 18:46:35

标签: c++ audio video ffmpeg

我正在尝试将H264编码数据和G711 PCM数据复用到mov多媒体容器中。我正在从编码数据创建AVPacket,并且最初视频/音频帧的PTS和DTS值等于AV_NOPTS_VALUE。所以我使用当前时间信息计算了DTS。我的代码 -

bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) {
    .....................................
    .....................................
    .....................................
    AVPacket pkt = {0};
    av_init_packet(&pkt);
    int64_t dts = av_gettime();
    dts = av_rescale_q(dts, (AVRational){1, 1000000}, m_pVideoStream->time_base);
    int duration = 90000 / VIDEO_FRAME_RATE;
    if(m_prevVideoDts > 0LL) {
        duration = dts - m_prevVideoDts;
    }
    m_prevVideoDts = dts;

    pkt.pts = AV_NOPTS_VALUE;
    pkt.dts = m_currVideoDts;
    m_currVideoDts += duration;
    pkt.duration = duration;
    if(bIFrame) {
        pkt.flags |= AV_PKT_FLAG_KEY;
    }
    pkt.stream_index = m_pVideoStream->index;
    pkt.data = (uint8_t*) pData;
    pkt.size = iDataSize;

    int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt);

    if(ret < 0) {
        LogErr("Writing video frame failed.");
        return false;
    }

    Log("Writing video frame done.");

    av_free_packet(&pkt);
    return true;
}

bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) {
    .................................
    .................................
    .................................
    AVPacket pkt = {0};
    av_init_packet(&pkt);

    int64_t dts = av_gettime();
    dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000});
    int duration = AUDIO_STREAM_DURATION; // 20
    if(m_prevAudioDts > 0LL) {
        duration = dts - m_prevAudioDts;
    }
    m_prevAudioDts = dts;
    pkt.pts = AV_NOPTS_VALUE;
    pkt.dts = m_currAudioDts;
    m_currAudioDts += duration;
    pkt.duration = duration;

    pkt.stream_index = m_pAudioStream->index;
    pkt.flags |= AV_PKT_FLAG_KEY;
    pkt.data = (uint8_t*) pEncodedData;
    pkt.size = iDataSize;

    int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt);
    if(ret < 0) {
        LogErr("Writing audio frame failed: %d", ret);
        return false;
    }

    Log("Writing audio frame done.");

    av_free_packet(&pkt);
    return true;
}

我添加了这样的流 -

AVStream* AudioVideoRecorder::AddMediaStream(enum AVCodecID codecID) {
    ................................
    .................................   
    pStream = avformat_new_stream(m_pFormatCtx, codec);
    if (!pStream) {
        LogErr("Could not allocate stream.");
        return NULL;
    }
    pStream->id = m_pFormatCtx->nb_streams - 1;
    pCodecCtx = pStream->codec;
    pCodecCtx->codec_id = codecID;

    switch(codec->type) {
    case AVMEDIA_TYPE_VIDEO:
        pCodecCtx->bit_rate = VIDEO_BIT_RATE;
        pCodecCtx->width = PICTURE_WIDTH;
        pCodecCtx->height = PICTURE_HEIGHT;
        pStream->time_base = (AVRational){1, 90000};
        pStream->avg_frame_rate = (AVRational){90000, 1};
        pStream->r_frame_rate = (AVRational){90000, 1}; // though the frame rate is variable and around 15 fps
        pCodecCtx->pix_fmt = STREAM_PIX_FMT;
        m_pVideoStream = pStream;
        break;

    case AVMEDIA_TYPE_AUDIO:
        pCodecCtx->sample_fmt = AV_SAMPLE_FMT_S16;
        pCodecCtx->bit_rate = AUDIO_BIT_RATE;
        pCodecCtx->sample_rate = AUDIO_SAMPLE_RATE;
        pCodecCtx->channels = 1;
        m_pAudioStream = pStream;
        break;

    default:
        break;
    }

    /* Some formats want stream headers to be separate. */
    if (m_pOutputFmt->flags & AVFMT_GLOBALHEADER)
        m_pFormatCtx->flags |= CODEC_FLAG_GLOBAL_HEADER;

    return pStream;
}

此计算存在以下几个问题:

  1. 随着时间的推移,视频越来越滞后而且落后于音频。

  2. 假设最近接收到的音频帧(WriteAudio(..))很少,如3秒,那么应该以3秒的延迟开始播放后期帧,但事实并非如此。延迟帧与前一帧连续播放。

  3. 有时我录制了约40秒,但文件持续时间大约是2分钟,但音频/视频只播放了很短的时间,如40秒,文件的其余部分不包含任何内容,并且在40后立即跳转到了秒(在VLC中测试)。

  4. 编辑:

    根据Ronald S. Bultje的建议,我明白了:

    m_pAudioStream->time_base = (AVRational){1, 9000}; // actually no need to set as 9000 is already default value for audio as you said
    m_pVideoStream->time_base = (AVRational){1, 9000};
    

    应设置为现在音频和视频流现在都是相同的时基单位。

    视频:

    ...................
    ...................
    
    int64_t dts = av_gettime(); // get current time in microseconds
    dts *= 9000; 
    dts /= 1000000; // 1 second = 10^6 microseconds
    pkt.pts = AV_NOPTS_VALUE; // is it okay?
    pkt.dts = dts;
    // and no need to set pkt.duration, right?
    

    对于音频:(与视频完全相同,对吧?)

    ...................
    ...................
    
    int64_t dts = av_gettime(); // get current time in microseconds
    dts *= 9000; 
    dts /= 1000000; // 1 second = 10^6 microseconds
    pkt.pts = AV_NOPTS_VALUE; // is it okay?
    pkt.dts = dts;
    // and no need to set pkt.duration, right?
    

    我认为他们现在就像分享currDts一样,对吗?如果我在任何地方错了或遗失任何东西,请纠正我。

    另外,如果我想将视频流时基用作(AVRational){1, frameRate},将音频流时基用作(AVRational){1, sampleRate},那么正确的代码应如何显示?

    编辑2.0:

    m_pAudioStream->time_base = (AVRational){1, VIDEO_FRAME_RATE};
    m_pVideoStream->time_base = (AVRational){1, VIDEO_FRAME_RATE};
    

    并且

    bool AudioVideoRecorder::WriteAudio(const unsigned char *pEncodedData, size_t iDataSize) {
        ...........................
        ......................
        AVPacket pkt = {0};
        av_init_packet(&pkt);
    
        int64_t dts = av_gettime() / 1000; // convert into millisecond
        dts = dts * VIDEO_FRAME_RATE;
        if(m_dtsOffset < 0) {
            m_dtsOffset = dts;
        }
    
        pkt.pts = AV_NOPTS_VALUE;
        pkt.dts = (dts - m_dtsOffset);
    
        pkt.stream_index = m_pAudioStream->index;
        pkt.flags |= AV_PKT_FLAG_KEY;
        pkt.data = (uint8_t*) pEncodedData;
        pkt.size = iDataSize;
    
        int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt);
        if(ret < 0) {
            LogErr("Writing audio frame failed: %d", ret);
            return false;
        }
    
        Log("Writing audio frame done.");
    
        av_free_packet(&pkt);
        return true;
    }
    
    bool AudioVideoRecorder::WriteVideo(const unsigned char *pData, size_t iDataSize, bool const bIFrame) {
        ........................................
        .................................
        AVPacket pkt = {0};
        av_init_packet(&pkt);
        int64_t dts = av_gettime() / 1000;
        dts = dts * VIDEO_FRAME_RATE;
        if(m_dtsOffset < 0) {
            m_dtsOffset = dts;
        }
        pkt.pts = AV_NOPTS_VALUE;
        pkt.dts = (dts - m_dtsOffset);
    
        if(bIFrame) {
            pkt.flags |= AV_PKT_FLAG_KEY;
        }
        pkt.stream_index = m_pVideoStream->index;
        pkt.data = (uint8_t*) pData;
        pkt.size = iDataSize;
    
        int ret = av_interleaved_write_frame(m_pFormatCtx, &pkt);
    
        if(ret < 0) {
            LogErr("Writing video frame failed.");
            return false;
        }
    
        Log("Writing video frame done.");
    
        av_free_packet(&pkt);
        return true;
    }
    

    最后一次改变还好吗?视频和音频似乎已同步。唯一的问题是 - 无论数据包延迟到达,音频都会无延迟播放。 喜欢 -

    包到达:1 2 3 4 ...(然后下一帧在3秒后到达).. 5

    音频播放:1 2 3 4(无延迟)5

    编辑3.0:

    归零音频样本数据:

    AVFrame* pSilentData;
    pSilentData = av_frame_alloc();
    memset(&pSilentData->data[0], 0, iDataSize);
    
    pkt.data = (uint8_t*) pSilentData;
    pkt.size = iDataSize;
    
    av_freep(&pSilentData->data[0]);
    av_frame_free(&pSilentData);
    

    这可以吗?但在将其写入文件容器后,播放媒体时会出现 点点 噪音。问题是什么?

    编辑4.0:

    好吧,For µ-Law audio the zero value is represented as 0xff。所以 -

    memset(&pSilentData->data[0], 0xff, iDataSize);
    

    解决我的问题。

1 个答案:

答案 0 :(得分:2)

时间戳(例如dts)应该是AVStream.time_base单位。您正在请求1/90000的视频时基和默认音频时基(1/9000),但您使用1/100000的时基来写入dts值。我也不确定它是否能保证在标题写入期间保持所请求的时基,您的复用器可能会更改值并期望您处理新值。

所以这样的代码:

int64_t dts = av_gettime();
dts = av_rescale_q(dts, (AVRational){1, 1000000}, (AVRational){1, 90000});
int duration = AUDIO_STREAM_DURATION; // 20
if(m_prevAudioDts > 0LL) {
    duration = dts - m_prevAudioDts;
}

赢了工作。将其更改为使用音频流的时基,并且除非您知道自己在做什么,否则不要设置持续时间。 (视频也一样。)

m_prevAudioDts = dts;
pkt.pts = AV_NOPTS_VALUE;
pkt.dts = m_currAudioDts;
m_currAudioDts += duration;
pkt.duration = duration;

这看起来令人毛骨悚然,特别是与视频相似的代码相结合。这里的问题是,无论流之间的数据包间延迟如何,两者的第一个数据包都将具有零时间戳。您需要在所有流之间共享一个父currDts,否则您的流将永远不同步。

[编辑]

因此,关于您的编辑,如果您有音频空白,我认为您需要在间隙期间插入静音(归零音频样本数据)。