Question

我正在尝试将用户的声音与音乐混合并将其保存到文件中。

我创建了2个解码器-1个用于语音，1个用于音乐，然后将它们放入Mixer的输入中。我解码每个帧，然后使用FILE / createWAV / fwrite将其保存到文件中。

当我的歌曲为.wav且与录制的语音（48000/1024）具有相同的sampleRate和samplesPerFrame时，一切都可以完美运行。

但是，当我想使用带有不同参数的.mp3文件（44100/1152）时，最终文件不正确-它被拉伸或发出crack啪声。我认为这是因为我们为每个解码器获取了不同的sampledDecoded，并将其放入混音器或保存到文件中时-这些样本之间的差异缺失了。

据我所知，当我们voiceDecoder->decode(buffer, &samplesDecoded)执行操作时，它会将samplePosition移动samplesDecoded。

我试图做的是使用两个解码器的最小值。但是，根据以上语句，每个循环迭代中的歌曲将丢失（1152-1024 = 128）128个样本，因此我也尝试将songDecoder与voiceDecoder相同：songDecoder->seek(voiceDecoder->samplePosition, true)，但它导致文件完全不正确。

总结：当两个解码器的sampleRate和samplesPerFrame不同时，我应该如何处理混合器/ offlineProcessing？

代码：

void AudioProcessor::startProcessing() {
    SuperpoweredStereoMixer *mixer = new SuperpoweredStereoMixer();
    float *mixerInputs_[] = {0,0,0,0};
    float *mixerOutputs_[] = {0,0};
    float inputLevels_[]= {0.5f, 0.5f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f};
    float outputLevels_[] = { 1.0f, 1.0f };

    SuperpoweredDecoder *voiceDecoder = new SuperpoweredDecoder();
    SuperpoweredDecoder *songDecoder = new SuperpoweredDecoder();

    if (voiceDecoder->open(voiceInputPath, false) || songDecoder->open(songInputPath, false, songOffset, songLength)) {
        delete voiceDecoder;
        delete songDecoder;
        delete mixer;
        callJavaVoidMethodWithBoolParam(jvm, jObject, processingFinishedMethodId, false);
        return;
    };

    FILE *fd = createWAV(outputPath, songDecoder->samplerate, 2);
    if (!fd) {
        delete voiceDecoder;
        delete songDecoder;
        delete mixer;
        callJavaVoidMethodWithBoolParam(jvm, jObject, processingFinishedMethodId, false);
        return;
    };

    // Create a buffer for the 16-bit integer samples coming from the decoder.
    short int *voiceIntBuffer = (short int *)malloc(voiceDecoder->samplesPerFrame * 4 * sizeof(short int) + 32768);
    short int *songIntBuffer = (short int *)malloc(songDecoder->samplesPerFrame * 4 * sizeof(short int) + 32768);
    short int *outputIntBuffer = (short int *)malloc(voiceDecoder->samplesPerFrame * 4 * sizeof(short int) + 32768);

    // Create a buffer for the 32-bit floating point samples required by the effect.
    float *voiceFloatBuffer = (float *)malloc(voiceDecoder->samplesPerFrame * 4 * sizeof(float) + 32768);
    float *songFloatBuffer = (float *)malloc(songDecoder->samplesPerFrame * 4 * sizeof(float) + 32768);
    float *outputFloatBuffer = (float *)malloc(voiceDecoder->samplesPerFrame * 4 * sizeof(float) + 32768);

    bool isError = false;

    // Processing.
    while (true) {
        if (isCanceled) {
            isError = true;
            break;
        }

        // Decode one frame. samplesDecoded will be overwritten with the actual decoded number of samples.
        unsigned int voiceSamplesDecoded = voiceDecoder->samplesPerFrame;
        if (voiceDecoder->decode(voiceIntBuffer, &voiceSamplesDecoded) == SUPERPOWEREDDECODER_ERROR) {
            break;
        }
        if (voiceSamplesDecoded < 1) {
            break;
        }

        //
        // Decode one frame. samplesDecoded will be overwritten with the actual decoded number of samples.
        unsigned int songSamplesDecoded = songDecoder->samplesPerFrame;
        if (songDecoder->decode(songIntBuffer, &songSamplesDecoded) == SUPERPOWEREDDECODER_ERROR) {
            break;
        }
        if (songSamplesDecoded < 1) {
            break;
        }

        unsigned int samplesDecoded = static_cast<unsigned int>(fmin(voiceSamplesDecoded, songSamplesDecoded));

        // Convert the decoded PCM samples from 16-bit integer to 32-bit floating point.
        SuperpoweredShortIntToFloat(voiceIntBuffer, voiceFloatBuffer, samplesDecoded);
        SuperpoweredShortIntToFloat(songIntBuffer, songFloatBuffer, samplesDecoded);

        //setup mixer inputs
        mixerInputs_[0] = voiceFloatBuffer;
        mixerInputs_[1] = songFloatBuffer;
        mixerInputs_[2] = NULL;
        mixerInputs_[3] = NULL;

        // setup mixer outputs, might have two separate outputs (L/R) if second not null
        mixerOutputs_[0] = outputFloatBuffer;
        mixerOutputs_[1] = NULL;

        mixer->process(mixerInputs_, mixerOutputs_, inputLevels_, outputLevels_, NULL, NULL, samplesDecoded);

        // Convert the PCM samples from 32-bit floating point to 16-bit integer.
        SuperpoweredFloatToShortInt(outputFloatBuffer, outputIntBuffer, samplesDecoded);

        // Write the audio to disk.
        fwrite(outputIntBuffer, 1, samplesDecoded * 4, fd);

        // songDecoder->seek(voiceDecoder->samplePosition, true);
    }

    // Cleanup.
    closeWAV(fd);
    delete voiceDecoder;
    delete songDecoder;
    delete mixer;
    free(voiceIntBuffer);
    free(voiceFloatBuffer);
    free(songIntBuffer);
    free(songFloatBuffer);
    free(outputFloatBuffer);
    free(outputIntBuffer);
}

谢谢！

Answer 1

您需要使用SuperpoweredResampler类来匹配采样率。两个输入都还需要一个循环缓冲区，因为在许多情况下可用的样本数量将不匹配。

Answer 2

好的，所以我设法使其工作。我做了@Gabor提出的建议，但没有完全起作用。我所缺少的是通道-我必须将其包含在缓冲/移位操作中，现在还可以！

超级强大-导出到具有混音器和解码器问题的文件（不同的sampleRates和samplesPerFrame）

2 个答案: