Question

在我的一个项目中，我需要将 PCM 音频数据重新采样为不同的采样率。我正在使用 javax.sound.sampled.AudioSystem 来完成这项任务。重采样似乎在帧的开头和结尾添加了额外的样本。这是一个最小的工作示例：

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.util.Arrays;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;

ublic class ResamplingTest {

  public static void main(final String[] args) throws IOException {
    final int nrOfSamples = 4;
    final int bytesPerSample = 2;
    final byte[] data = new byte[nrOfSamples * bytesPerSample];
    Arrays.fill(data, (byte) 10);
    final AudioFormat inputFormat = new AudioFormat(32000, bytesPerSample * 8, 1, true, false);
    final AudioInputStream inputStream = new AudioInputStream(new ByteArrayInputStream(data), inputFormat, data.length);
    final AudioFormat outputFormat = new AudioFormat(24000, bytesPerSample * 8, 1, true, false);
    final AudioInputStream outputStream = AudioSystem.getAudioInputStream(outputFormat, inputStream);
    final var resampledBytes = outputStream.readAllBytes();
    System.out.println("Expected number of samples after resampling "
        + (int) (nrOfSamples * outputFormat.getSampleRate() / inputFormat.getSampleRate()));
    System.out.println("Actual number of samples after resampling " + resampledBytes.length / bytesPerSample);
    System.out.println(Arrays.toString(resampledBytes));
  }
}

当从 32 kHz 到 24 kHz 重新采样 4 个样本时，我预计正好有 3 个样本。但是，上面的代码生成了 5 个样本。额外样本的数量似乎取决于输入和输出采样率。例如，如果我从 8 kHz 重新采样到 32 kHz，则会生成 8 个额外的样本。为什么重采样会增加额外的样本，我怎么知道在一个帧的开头和结尾添加了多少个样本？

Answer 1

我正在玩这个。我真的没有答案，只是一些想法。我怀疑出于算法目的，这些流被“填充”了开头或结尾的零。

首先，这似乎没什么区别，但是您的 AudioInputStream 实例化应该是帧数，而不是字节数。

我运行你的程序时每个样本只使用 1 个字节，因为它似乎让事情更清晰，每帧的值为 10。

Original number of samples: 4
Expected number of samples after resampling 3
Actual number of samples after resampling 5
original data: [10, 10, 10, 10]
resampled data: [0, 3, 10, 10, 6]

Original number of samples: 5
Expected number of samples after resampling 3
Actual number of samples after resampling 6
original data: [10, 10, 10, 10, 10]
resampled data: [0, 3, 10, 10, 10, 3]

Original number of samples: 6
Expected number of samples after resampling 4
Actual number of samples after resampling 7
original data: [10, 10, 10, 10, 10, 10]
resampled data: [0, 3, 10, 10, 10, 10, 0]

Original number of samples: 7
Expected number of samples after resampling 5
Actual number of samples after resampling 7
original data: [10, 10, 10, 10, 10, 10, 10]
resampled data: [0, 3, 10, 10, 10, 10, 10]

Original number of samples: 8
Expected number of samples after resampling 6
Actual number of samples after resampling 8
original data: [10, 10, 10, 10, 10, 10, 10, 10]
resampled data: [0, 3, 10, 10, 10, 10, 10, 6]

Original number of samples: 9
Expected number of samples after resampling 6
Actual number of samples after resampling 9
original data: [10, 10, 10, 10, 10, 10, 10, 10, 10]
resampled data: [0, 3, 10, 10, 10, 10, 10, 10, 3]

Original number of samples: 10
Expected number of samples after resampling 7
Actual number of samples after resampling 10
original data: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
resampled data: [0, 3, 10, 10, 10, 10, 10, 10, 10, 0]

Original number of samples: 11
Expected number of samples after resampling 8
Actual number of samples after resampling 10
original data: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
resampled data: [0, 3, 10, 10, 10, 10, 10, 10, 10, 10]

也许算法将输入行视为有一个前面的 0 值和一个结束的 0 值。后者似乎更明显。

如果您查看第 7、8 和 9 行的结尾。首先，我假设两个采样率“对齐”，因为输入线上的最后一个点也是输出上的一个点，不是“中间值”。当输出线上的最后一个点落在输入信号之外时，看起来像是在最后一个输入线值和 0 之间使用了线性插值。

我不清楚一开始发生了什么，但似乎算法也可能在 0 和第一个输入行值之间提出线性插值，但我不明白为什么不是0.6 而不是 0.3 或者为什么有前导零。

但是，在大多数情况下，请注意我们确实有 10 的预测数量！例外是当前导和结尾部分值加起来为 10（较少舍入，我假设 3 应该是 3.3，如果扩展小数点，6 应该是 6.7——尝试输入 100 而不是 10，你会看到） , 在第 4 行和第 8 行。

我还将假设变换算法是在一个用例中考虑到将有 1000 个样本，在这种情况下，一两个前导/结束附加值不会对声音产生有意义的影响，尤其是在考虑到的情况下它们在源信号和 0 之间倾斜。

在 Java 中重新采样音频

1 个答案: