Question

我正在复用视频和音频流。视频流来自生成的图像数据。音频流来自aac文件。有些音频文件比我设置的总视频时间长，因此当我的时间变得大于总视频时间（我用数字编码的视频帧控制的最后一个）时，我停止音频流复用的策略。

我不会在这里放置整个设置代码，但它类似于最新的FFMPEG repo中的muxing.c示例。唯一的区别是我使用来自文件的音频流，正如我所说的，不是来自合成生成的编码帧。我很确定问题是在muxer循环期间我的错误同步。这就是我所做的：

void AudioSetup(const char* audioInFileName)
{
    AVOutputFormat* outputF = mOutputFormatContext->oformat;
    auto audioCodecId = outputF->audio_codec;

    if (audioCodecId == AV_CODEC_ID_NONE) {
        return false;
    }

    audio_codec = avcodec_find_encoder(audioCodecId);

    avformat_open_input(&mInputAudioFormatContext,
    audioInFileName, 0, 0);
    avformat_find_stream_info(mInputAudioFormatContext, 0);

    av_dump_format(mInputAudioFormatContext, 0, audioInFileName, 0);


    for (size_t i = 0; i < mInputAudioFormatContext->nb_streams; i++) {
        if (mInputAudioFormatContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
            inAudioStream = mInputAudioFormatContext->streams[i];

            AVCodecParameters *in_codecpar = inAudioStream->codecpar;
            mAudioOutStream.st = avformat_new_stream(mOutputFormatContext, NULL);
            mAudioOutStream.st->id = mOutputFormatContext->nb_streams - 1;
            AVCodecContext* c = avcodec_alloc_context3(audio_codec);
            mAudioOutStream.enc = c;
            c->sample_fmt = audio_codec->sample_fmts[0];
            avcodec_parameters_to_context(c, inAudioStream->codecpar);
            //copyparams from input to autput audio stream:
            avcodec_parameters_copy(mAudioOutStream.st->codecpar, inAudioStream->codecpar);

            mAudioOutStream.st->time_base.num = 1;
            mAudioOutStream.st->time_base.den = c->sample_rate;

            c->time_base = mAudioOutStream.st->time_base;

            if (mOutputFormatContext->oformat->flags & AVFMT_GLOBALHEADER) {
                c->flags |= CODEC_FLAG_GLOBAL_HEADER;
            }
            break;
        }
    }
}

void Encode()
{
    int cc = av_compare_ts(mVideoOutStream.next_pts, mVideoOutStream.enc->time_base,
    mAudioOutStream.next_pts, mAudioOutStream.enc->time_base);

    if (mAudioOutStream.st == NULL || cc <= 0) {
        uint8_t* data = GetYUVFrame();//returns ready video YUV frame to work with
        int ret = 0;
        AVPacket pkt = { 0 };
        av_init_packet(&pkt);
        pkt.size = packet->dataSize;
        pkt.data = data;
        const int64_t duration = av_rescale_q(1, mVideoOutStream.enc->time_base, mVideoOutStream.st->time_base);

        pkt.duration = duration;
        pkt.pts = mVideoOutStream.next_pts;
        pkt.dts = mVideoOutStream.next_pts;
        mVideoOutStream.next_pts += duration;

        pkt.stream_index = mVideoOutStream.st->index;
        ret = av_interleaved_write_frame(mOutputFormatContext, &pkt);
    } else
    if(audio_time <  video_time) {
        //5 -  duration of video in seconds
        AVRational r = {  60, 1 };

        auto cmp= av_compare_ts(mAudioOutStream.next_pts, mAudioOutStream.enc->time_base, 5, r);
        if (cmp >= 0) {
            mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max();
            return true; //don't mux audio anymore
        }

        AVPacket a_pkt = { 0 };
        av_init_packet(&a_pkt);

        int ret = 0;
        ret = av_read_frame(mInputAudioFormatContext, &a_pkt);
        //if audio file is shorter than stop muxing when at the end of the file
        if (ret == AVERROR_EOF) {
            mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max(); 
            return true;
        }
        a_pkt.stream_index = mAudioOutStream.st->index;

        av_packet_rescale_ts(&a_pkt, inAudioStream->time_base, mAudioOutStream.st->time_base);
        mAudioOutStream.next_pts += a_pkt.pts;

        ret = av_interleaved_write_frame(mOutputFormatContext, &a_pkt);
    }
}

现在，视频部分完美无瑕。但是如果音轨比视频持续时间长，我的总视频长度会增加大约5％ - 20％，而且很明显音频正在为此做出贡献，因为视频帧完全准确地完成了是

最接近的＆＃39; hack＆＃39;我带来的是这部分：

AVRational r = {  60 ,1 };
auto cmp= av_compare_ts(mAudioOutStream.next_pts, mAudioOutStream.enc->time_base, 5, r);
if (cmp >= 0) {
    mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max();
    return true;
}

这里我试图比较音频流的next_pts和视频文件的总时间，即5秒。通过设置r = {60,1}我将这些秒转换为音频流的time_base。至少我相信我在做什么。有了这个黑客，当使用标准AAC文件时，我的偏差与正确的电影长度相差很小，即44100的采样率，立体声。但是，如果我测试更多有问题的样本，例如AAC采样率16000，单声道 - 那么视频文件几乎增加了整整一秒的大小。如果有人能指出我在这里做错了什么，我将不胜感激。

重要提示：我没有为任何情境设置持续时间。我控制多路复用会话的终止，这是基于视频帧计数。音频输入流当然有持续时间，但它不能帮助我，因为视频持续时间是定义电影长度的。

更新：

这是第二次赏金尝试。

更新2：

实际上，{den，num}的音频时间戳是错误的，而{1,1}确实是要走的路，正如答案所解释的那样。什么阻止它工作是这一行中的一个错误（我的坏）：

     mAudioOutStream.next_pts += a_pkt.pts;

必须是：

     mAudioOutStream.next_pts = a_pkt.pts;

该错误导致pts的指数增加，这导致非常早到达流的末尾（就pts而言），因此导致音频流比它应该更早地终止。

Answer 1

问题是你告诉它将给定的音频时间与5的{{1}}刻度进行比较。我真的很惊讶它在某些情况下有效，但我想这实际上取决于给定音频流的特定60 seconds per tick。

我们假设音频的time_base为time_base且流为1/25秒，这比你想要的多，所以你希望6返回{ {1}}或av_compare_ts。鉴于这些条件，您将拥有以下值：

因此，您使用以下参数调用1：

mAudioOutStream.next_pts = 150
mAudioOutStream.enc->time_base = 1/25

现在让我们看一下av_compare_ts的实现：

ts_a = 150
tb_a = 1/25
ts_b = 5
tb_b = 60/1

鉴于上述值，您得到：

av_compare_ts

然后使用以下参数调用int av_compare_ts(int64_t ts_a, AVRational tb_a, int64_t ts_b, AVRational tb_b) { int64_t a = tb_a.num * (int64_t)tb_b.den; int64_t b = tb_b.num * (int64_t)tb_a.den; if ((FFABS(ts_a)|a|FFABS(ts_b)|b) <= INT_MAX) return (ts_a*a > ts_b*b) - (ts_a*a < ts_b*b); if (av_rescale_rnd(ts_a, a, b, AV_ROUND_DOWN) < ts_b) return -1; if (av_rescale_rnd(ts_b, b, a, AV_ROUND_DOWN) < ts_a) return 1; return 0; }：

a = 1 * 1 = 1
b = 60 * 25 = 1500

根据我们的参数，我们实际上可以将整个函数av_rescale_rnd删除到以下行。（我不会复制a = 150 b = 1 c = 1500 rnd = AV_ROUND_DOWN的整个函数体，因为它很长，但您可以查看它here。）

av_rescale_rnd

这将返回av_rescale_rnd，即return (a * b) / c;。

因此(150 * 1) / 1500将解析为0，因为av_rescale_rnd(ts_a, a, b, AV_ROUND_DOWN) < ts_b小于true（0），因此ts_b将返回{ {1}}，这正是你想要的。

如果您将5更改为av_compare_ts它应该有效，因为现在您的-1实际上会被视为r：

1/1

在5我们现在得到：

5 seconds

然后使用以下参数调用ts_a = 150 tb_a = 1/25 ts_b = 5 tb_b = 1/1：

av_compare_ts

这将返回a = 1 * 1 = 1 b = 1 * 25 = 25，即av_rescale_rnd。

a = 150 b = 1 c = 25 rnd = AV_ROUND_DOWN大于(150 * 1) / 25，条件失败，再次调用6，这次是：

将返回5，即av_rescale_rnd。这比a = 5 b = 25 c = 1 rnd = AV_ROUND_DOWN小，因此会返回(5 * 25) / 1，并解决您的问题。

如果step_size大于1

如果音频流的125不是150，则需要修改1以解决此问题，例如step_size：

让我们快速回顾一下现在发生的事情：

大约6秒钟：

step_size = 1024获取以下参数：

r = { 1, 1024 };

因此：

mAudioOutStream.next_pts = 282
mAudioOutStream.enc->time_base = 1/48000

在av_compare_ts：

ts_a = 282
tb_a = 1/48000
ts_b = 5
tb_b = 1/1024

a = 1 * 1024 = 1024 b = 1 * 48000 = 48000会av_rescale_rnd = a = 282 b = 1024 c = 48000 rnd = AV_ROUND_DOWN (a * b) / c。

使用(282 * 1024) / 48000，你会再次获得288768 / 48000，因为它会计算6。

FFMPEG：复用具有不同持续时间的流

1 个答案: