我刚从编程的角度开始进行音乐编辑,并且我理解了很多关于波形和这种性质的东西的想法,但我仍然坚持如何从声音中读取单个样本的问题file作为字节数组。
我正在使用Alvas.Audio库(http://www.alvas.net/alvas.audio.aspx)和C#,如果这有助于回答这个问题。
据我所知,不同的文件格式有不同的存储数据的方式,但我的主要问题围绕如何以编程方式确定数据的存储方式,并一次迭代一个样本的文件。我可能会将所有文件转换为.wav格式(使用Alvas库),因此专门针对wav格式的答案就足够了,但我仍然对文件处于立体声时迭代样本感到好奇。据我所知,具有立体声数据的文件会连续存储并行样本。
我的最终目标是能够从歌曲的某个时间段(歌曲中的某个地方几秒钟)获取样本,然后对它们执行一些数学或其他内容,但我只是从不确保我读到的内容实际上是正确的数据。
答案 0 :(得分:3)
PCM(脉冲编码调制)是一种未压缩的音频格式。我们得到Wav文件,它维护(保存)PCM数据。了解如何做什么是Wav文件?方法AudioCompressionManager.GetWaveFormat有助于研究音频格式。
您可以使用以下代码更具体地分析PCM音频格式。
private void WhatIsPcmFormat(string fileName)
{
WaveReader wr = new WaveReader(File.OpenRead(fileName));
IntPtr format = wr.ReadFormat();
wr.Close();
WaveFormat wf = AudioCompressionManager.GetWaveFormat(format);
if (wf.wFormatTag == AudioCompressionManager.PcmFormatTag)
{
int bitsPerByte = 8;
Console.WriteLine("Channels: {0}, SamplesPerSec: {1}, BitsPerSample: {2}, BlockAlignIsEqual: {3}, BytesPerSecIsEqual: {4}",
wf.nChannels, wf.nSamplesPerSec, wf.wBitsPerSample,
(wf.nChannels * wf.wBitsPerSample) / bitsPerByte == wf.nBlockAlign,
(int)(wf.nChannels * wf.nSamplesPerSec * wf.wBitsPerSample) / bitsPerByte == wf.nAvgBytesPerSec);
}
}
答案 1 :(得分:1)
“打包”音频数据的最常见方式是PCM - 用于未压缩的WAV文件。每个样本都“打包”成短整数值(short
),如果您有可以提供PCM的库,则可以将数据视为short
值数组来获取数据。
根据频道数量的不同,每个样本的short
数量为short
。由于每个byte
为2 {{1}} s,因此每个样本通常有4个字节用于立体声音频。
因此,例如,要将1.0s位置的音频数据访问到音频文件中,您必须跳过44100 * 4字节,假设音频以44100采样(最常见的采样率 - 来自CD)。
答案 2 :(得分:0)
假设您知道如何打开文件并从中读取数据,则需要引用数据文件格式。有关WAV文件,请参阅here以获取有关如何组织和访问数据的说明。
Offset Size Name Description
The canonical WAVE format starts with the RIFF header:
0 4 ChunkID Contains the letters "RIFF" in ASCII form
(0x52494646 big-endian form).
4 4 ChunkSize 36 + SubChunk2Size, or more precisely:
4 + (8 + SubChunk1Size) + (8 + SubChunk2Size)
This is the size of the rest of the chunk
following this number. This is the size of the
entire file in bytes minus 8 bytes for the
two fields not included in this count:
ChunkID and ChunkSize.
8 4 Format Contains the letters "WAVE"
(0x57415645 big-endian form).
The "WAVE" format consists of two subchunks: "fmt " and "data":
The "fmt " subchunk describes the sound data's format:
12 4 Subchunk1ID Contains the letters "fmt "
(0x666d7420 big-endian form).
16 4 Subchunk1Size 16 for PCM. This is the size of the
rest of the Subchunk which follows this number.
20 2 AudioFormat PCM = 1 (i.e. Linear quantization)
Values other than 1 indicate some
form of compression.
22 2 NumChannels Mono = 1, Stereo = 2, etc.
24 4 SampleRate 8000, 44100, etc.
28 4 ByteRate == SampleRate * NumChannels * BitsPerSample/8
32 2 BlockAlign == NumChannels * BitsPerSample/8
The number of bytes for one sample including
all channels. I wonder what happens when
this number isn't an integer?
34 2 BitsPerSample 8 bits = 8, 16 bits = 16, etc.
2 ExtraParamSize if PCM, then doesn't exist
X ExtraParams space for extra parameters
The "data" subchunk contains the size of the data and the actual sound:
36 4 Subchunk2ID Contains the letters "data"
(0x64617461 big-endian form).
40 4 Subchunk2Size == NumSamples * NumChannels * BitsPerSample/8
This is the number of bytes in the data.
You can also think of this as the size
of the read of the subchunk following this
number.
44 * Data The actual sound data.
更新:内联添加数据。