我正在开发一个分析音频文件频率的简单程序。 使用fft长度为8192,采样率为44100,如果我使用恒定频率的wav文件作为输入 - 比如65Hz,200Hz或300Hz - 输出是该值的常数图。
如果我使用某人说话的录音,频率的峰值高达4000Hz,平均值为450 + ish的90秒文件。 起初我以为这是因为录音是立体声,但是将它转换为单声道,其比特率与测试文件完全相同,并没有太大变化。 (平均值从492下降到456但仍然太高)
有没有人知道可能导致这种情况的原因?
我认为我不应该找到最高价值但可能采取平均值或中值?
编辑:使用每8192字节缓冲区的大小的平均值,并获得最接近该大小的索引会使所有内容混乱。
这是Sample Aggregator在为当前缓冲区计算fft时触发的事件处理程序的代码
void FftCalculated(object sender, FftEventArgs e)
{
int length = e.Result.Length;
float[] magnitudes = new float[length];
for (int i = 0; i < length / 2; i++)
{
float real = e.Result[i].X;
float imaginary = e.Result[i].Y;
magnitudes[i] = (float)(10 * Math.Log10(Math.Sqrt((real * real) + (imaginary * imaginary))));
}
float max_mag = float.MinValue;
float max_index = -1;
for (int i = 0; i < length / 2; i++)
if (magnitudes[i] > max_mag)
{
max_mag = magnitudes[i];
max_index = i;
}
var currentFrequency = max_index * SAMPLERATE / 8192;
Console.WriteLine("frequency be " + currentFrequency);
}
附加:这是读取文件并将其发送到分析部分的代码
using (var rdr = new WaveFileReader(audioFilePath))
{
var newFormat = new WaveFormat(Convert.ToInt32(SAMPLERATE/*44100*/), 16, 1);
byte[] buffer = new byte[8192];
var audioData = new AudioData(); //custom class for project
using (var conversionStream = new WaveFormatConversionStream(newFormat, rdr))
{
// Used to send audio in realtime, it's a timestamps issue for the graphs
// I'm working on fixing this, but it has lower priority so disregard it :p
TimeSpan audioDuration = conversionStream.TotalTime;
long audioLength = conversionStream.Length;
int waitTime = (int)(audioDuration.TotalMilliseconds / audioLength * 8192);
while (conversionStream.Read(buffer, 0, buffer.Length) != 0)
{
audioData.AudioDataBase64 = Utils.Base64Encode(buffer);
Thread.Sleep(waitTime);
SendMessage("AudioData", Utils.StringToAscii(AudioData.GetJSON(audioData)));
}
Console.WriteLine("Reached End of File");
}
}
这是接收音频数据的代码
{
var audioData = new AudioData();
audioData =
AudioData.GetStateFromJSON(Utils.AsciiToString(receivedMessage));
QueueAudio(Utils.Base64Decode(audioData.AudioDataBase64)));
}
接着是
var waveFormat = new WaveFormat(Convert.ToInt32(SAMPLERATE/*44100*/), 16, 1);
_bufferedWaveProvider = new BufferedWaveProvider(waveFormat);
_bufferedWaveProvider.BufferDuration = new TimeSpan(0, 2, 0);
{
void QueueAudio(byte[] data)
{
_bufferedWaveProvider.AddSamples(data, 0, data.Length);
if (_bufferedWaveProvider.BufferedBytes >= fftLength)
{
byte[] buffer = new byte[_bufferedWaveProvider.BufferedBytes];
_bufferedWaveProvider.Read(buffer, 0, _bufferedWaveProvider.BufferedBytes);
for (int index = 0; index < buffer.Length; index += 2)
{
short sample = (short)((buffer[index] | buffer[index + 1] << 8));
float sample32 = (sample) / 32767f;
sampleAggregator.Add(sample32);
}
}
}
}
然后SampleAggregator在完成fft时触发上面的事件。