Question

我刚从编程的角度开始进行音乐编辑，并且我理解了很多关于波形和这种性质的东西的想法，但我仍然坚持如何从声音中读取单个样本的问题file作为字节数组。

我正在使用Alvas.Audio库（http://www.alvas.net/alvas.audio.aspx）和C＃，如果这有助于回答这个问题。

据我所知，不同的文件格式有不同的存储数据的方式，但我的主要问题围绕如何以编程方式确定数据的存储方式，并一次迭代一个样本的文件。我可能会将所有文件转换为.wav格式（使用Alvas库），因此专门针对wav格式的答案就足够了，但我仍然对文件处于立体声时迭代样本感到好奇。据我所知，具有立体声数据的文件会连续存储并行样本。

我的最终目标是能够从歌曲的某个时间段（歌曲中的某个地方几秒钟）获取样本，然后对它们执行一些数学或其他内容，但我只是从不确保我读到的内容实际上是正确的数据。

Answer 1

另见What is a PCM format?

PCM（脉冲编码调制）是一种未压缩的音频格式。我们得到Wav文件，它维护（保存）PCM数据。了解如何做什么是Wav文件？方法AudioCompressionManager.GetWaveFormat有助于研究音频格式。

FormatTag = 1是PCM。
声道=适用于单声道（单声道），双声道（立体声），8适用于7.1环绕声（左，右，中，左环绕，右环绕，左后，右后位置.7.1系统还有1个低频效果通道（LFE），通常发送到低音炮。）
SamplesPerSec =每秒数字化的数量值（或采样）。可能是任何东西，但标准值：8000 Hz，11025 Hz，12000 Hz，16000 Hz，22050 Hz，24000 Hz，32000 Hz，44100 Hz，48000 Hz。
BitsPerSample - 最常见的用途是8位（1字节）和16位（2字节）。很少24位（3字节），32位（4字节）和64位（4字节）。如果我们将16位视为基本，则可以将8位视为压缩格式。它的大小减少了两倍，但是对于16位，值的变体只能是28 = 256而不是216 = 65536。这就是为什么8位音质将明显低于16位。
BlockAlign = Channels * BitsPerSample / 8.其中8是每个字节的位数。
AvgBytesPerSec（bitrate）=频道* SamplesPerSec * BitsPerSample / 8.

您可以使用以下代码更具体地分析PCM音频格式。

    private void WhatIsPcmFormat(string fileName)
    {
        WaveReader wr = new WaveReader(File.OpenRead(fileName));
        IntPtr format = wr.ReadFormat();
        wr.Close();
        WaveFormat wf = AudioCompressionManager.GetWaveFormat(format);
        if (wf.wFormatTag == AudioCompressionManager.PcmFormatTag)
        {
            int bitsPerByte = 8;
            Console.WriteLine("Channels: {0}, SamplesPerSec: {1}, BitsPerSample: {2}, BlockAlignIsEqual: {3}, BytesPerSecIsEqual: {4}", 
            wf.nChannels, wf.nSamplesPerSec, wf.wBitsPerSample, 
            (wf.nChannels * wf.wBitsPerSample) / bitsPerByte == wf.nBlockAlign, 
            (int)(wf.nChannels * wf.nSamplesPerSec * wf.wBitsPerSample) / bitsPerByte == wf.nAvgBytesPerSec);
        }
    }

Answer 2

“打包”音频数据的最常见方式是PCM - 用于未压缩的WAV文件。每个样本都“打包”成短整数值（short），如果您有可以提供PCM的库，则可以将数据视为short值数组来获取数据。

根据频道数量的不同，每个样本的short数量为short。由于每个byte为2 {{1}} s，因此每个样本通常有4个字节用于立体声音频。

因此，例如，要将1.0s位置的音频数据访问到音频文件中，您必须跳过44100 * 4字节，假设音频以44100采样（最常见的采样率 - 来自CD）。

Answer 3

假设您知道如何打开文件并从中读取数据，则需要引用数据文件格式。有关WAV文件，请参阅here以获取有关如何组织和访问数据的说明。

Offset  Size  Name             Description

The canonical WAVE format starts with the RIFF header:

0         4   ChunkID          Contains the letters "RIFF" in ASCII form
                               (0x52494646 big-endian form).
4         4   ChunkSize        36 + SubChunk2Size, or more precisely:
                               4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
                               This is the size of the rest of the chunk 
                               following this number.  This is the size of the 
                               entire file in bytes minus 8 bytes for the
                               two fields not included in this count:
                               ChunkID and ChunkSize.
8         4   Format           Contains the letters "WAVE"
                               (0x57415645 big-endian form).

The "WAVE" format consists of two subchunks: "fmt " and "data":
The "fmt " subchunk describes the sound data's format:

12        4   Subchunk1ID      Contains the letters "fmt "
                               (0x666d7420 big-endian form).
16        4   Subchunk1Size    16 for PCM.  This is the size of the
                               rest of the Subchunk which follows this number.
20        2   AudioFormat      PCM = 1 (i.e. Linear quantization)
                               Values other than 1 indicate some 
                               form of compression.
22        2   NumChannels      Mono = 1, Stereo = 2, etc.
24        4   SampleRate       8000, 44100, etc.
28        4   ByteRate         == SampleRate * NumChannels * BitsPerSample/8
32        2   BlockAlign       == NumChannels * BitsPerSample/8
                               The number of bytes for one sample including
                               all channels. I wonder what happens when
                               this number isn't an integer?
34        2   BitsPerSample    8 bits = 8, 16 bits = 16, etc.
          2   ExtraParamSize   if PCM, then doesn't exist
          X   ExtraParams      space for extra parameters

The "data" subchunk contains the size of the data and the actual sound:

36        4   Subchunk2ID      Contains the letters "data"
                               (0x64617461 big-endian form).
40        4   Subchunk2Size    == NumSamples * NumChannels * BitsPerSample/8
                               This is the number of bytes in the data.
                               You can also think of this as the size
                               of the read of the subchunk following this 
                               number.
44        *   Data             The actual sound data.

更新：内联添加数据。

如何从音乐文件中读取样本？

3 个答案: