我正试图通过加速框架找到倒谱分析的峰值。我总是在帧的结尾处或帧的开头得到峰值。我正在分析它实时从麦克风获取音频。我的代码出了什么问题?我的代码如下:
OSStatus microphoneInputCallback (void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData){
// get reference of test app we need for test app attributes
TestApp *this = (TestApp *)inRefCon;
COMPLEX_SPLIT complexArray = this->fftA;
void *dataBuffer = this->dataBuffer;
float *outputBuffer = this->outputBuffer;
FFTSetup fftSetup = this->fftSetup;
uint32_t log2n = this->fftLog2n;
uint32_t n = this->fftN; // 4096
uint32_t nOver2 = this->fftNOver2;
uint32_t stride = 1;
int bufferCapacity = this->fftBufferCapacity; // 4096
SInt16 index = this->fftIndex;
OSStatus renderErr;
// observation objects
float *observerBufferRef = this->observerBuffer;
int observationCountRef = this->observationCount;
renderErr = AudioUnitRender(rioUnit, ioActionFlags,
inTimeStamp, bus1, inNumberFrames, this->bufferList);
if (renderErr < 0) {
return renderErr;
}
// Fill the buffer with our sampled data. If we fill our buffer, run the
// fft.
int read = bufferCapacity - index;
if (read > inNumberFrames) {
memcpy((SInt16 *)dataBuffer + index, this->bufferList->mBuffers[0].mData, inNumberFrames*sizeof(SInt16));
this->fftIndex += inNumberFrames;
} else {
// If we enter this conditional, our buffer will be filled and we should PERFORM FFT.
memcpy((SInt16 *)dataBuffer + index, this->bufferList->mBuffers[0].mData, read*sizeof(SInt16));
// Reset the index.
this->fftIndex = 0;
/*************** FFT ***************/
//multiply by window
vDSP_vmul((SInt16 *)dataBuffer, 1, this->window, 1, this->outputBuffer, 1, n);
// We want to deal with only floating point values here.
vDSP_vflt16((SInt16 *) dataBuffer, stride, (float *) outputBuffer, stride, bufferCapacity );
/**
Look at the real signal as an interleaved complex vector by casting it.
Then call the transformation function vDSP_ctoz to get a split complex
vector, which for a real signal, divides into an even-odd configuration.
*/
vDSP_ctoz((COMPLEX*)outputBuffer, 2, &complexArray, 1, nOver2);
// Carry out a Forward FFT transform.
vDSP_fft_zrip(fftSetup, &complexArray, stride, log2n, FFT_FORWARD);
vDSP_ztoc(&complexArray, 1, (COMPLEX *)outputBuffer, 2, nOver2);
complexArray.imagp[0] = 0.0f;
vDSP_zvmags(&complexArray, 1, complexArray.realp, 1, nOver2);
bzero(complexArray.imagp, (nOver2) * sizeof(float));
// scale
float scale = 1.0f / (2.0f*(float)n);
vDSP_vsmul(complexArray.realp, 1, &scale, complexArray.realp, 1, nOver2);
// step 2 get log for cepstrum
float *logmag = malloc(sizeof(float)*nOver2);
for (int i=0; i < nOver2; i++)
logmag[i] = logf(sqrtf(complexArray.realp[i]));
// configure float array into acceptable input array format (interleaved)
vDSP_ctoz((COMPLEX*)logmag, 2, &complexArray, 1, nOver2);
// create cepstrum
vDSP_fft_zrip(fftSetup, &complexArray, stride, log2n-1, FFT_INVERSE);
//convert interleaved to real
float *displayData = malloc(sizeof(float)*n);
vDSP_ztoc(&complexArray, 1, (COMPLEX*)displayData, 2, nOver2);
float dominantFrequency = 0;
int currentBin = 0;
float dominantFrequencyAmp = 0;
// find peak of cepstrum
for (int i=0; i < nOver2; i++){
//get current frequency magnitude
if (displayData[i] > dominantFrequencyAmp) {
// DLog("Bufferer filled %f", displayData[i]);
dominantFrequencyAmp = displayData[i];
currentBin = i;
}
}
DLog("currentBin : %i amplitude: %f", currentBin, dominantFrequencyAmp);
}
return noErr;
}
答案 0 :(得分:0)
我没有使用过加速框架,但您的代码似乎正在采取适当的步骤来计算倒谱。
真实声信号的倒谱往往具有非常大的DC分量,在零交叉处和附近具有大的峰值[原文如此]。只需忽略倒谱的近DC部分,寻找高于20 Hz频率的峰值(高于Cepstrum_Width / 20Hz的频率)。
如果输入信号包含一系列非常紧密间隔的泛音,则倒谱在高频率端也会有一个大峰值。
例如,下图显示了Dirichlet核的倒谱,N = 128,宽度= 4096,其频谱是一系列非常紧密间隔的泛音。
您可能希望使用静态合成信号来测试和调试代码。测试信号的一个很好的选择是任何具有基本F的正弦波和几个F的整数倍的泛音。
您的Cepstra应该类似于以下示例。
首先是合成信号。
下图显示了合成稳态E2音符的倒谱,使用典型的近直流分量,82.4 Hz的基波和82.4 Hz的整数倍的8次谐波合成。对合成正弦曲线进行编程以生成4096个样本。
观察12.36处突出的非DC峰值。倒谱宽度为1024(第二个FFT的输出),因此峰值对应于1024 / 12.36 = 82.8 Hz,非常接近真正的基频82.4 Hz。
现在是一个真实的声学信号。
下图显示了真正的原声吉他E2音符的倒谱。在第一次FFT之前,信号没有加窗。观察突出的非DC峰值542.9。倒谱宽度为32768(第二个FFT的输出),因此峰值对应于32768 / 542.9 = 60.4 Hz,这与真正的基频相差82.4 Hz。
下图显示了相同真实原声吉他的E2音符的倒谱,但这次信号在第一次FFT之前被Hann窗口化。观察268.46处突出的非DC峰值。倒谱宽度为32768(第二个FFT的输出),因此峰值对应于32768 / 268.46 = 122.1 Hz,甚至比真正的基频82.4 Hz还要远。
用于此分析的原声吉他的E2音符在工作室条件下使用高质量麦克风以44.1 KHz采样,它基本上包含零背景噪音,没有其他乐器或声音,也没有后期处理。
参考文献:
真实的音频信号数据,合成信号生成,绘图,FFT和倒谱分析在这里完成:Musical instrument cepstrum