我正在关注如何制作倒谱用于检测语音共振峰并使用iPhone Accelerate框架对其进行编码的an article。但是,结果并不像文章所期望的那样。对于清音部分(文章中的图3),它在前几个箱中显示较小的值。但是,当我的代码运行时,清音部分的值很大(朝向1.0),看起来更像是一个有声部分。
这是我的代码:
// copy buffer data into a separate array and apply hamming window
// don't use leadlength because we copied to beginning of buffer
int offset = (int)(s * stepSize);
float *hamBuffer = malloc(n*sizeof(float));
for (int i=0; i < n; i++)
hamBuffer[i] = hpBuffer[offset+i] * ((1.0f-0.46f) - 0.46f*cos(TWOPI*i/((float)n-1.0f)));
// configure float array into acceptable input array format (interleaved)
vDSP_ctoz((COMPLEX*)hamBuffer, 2, &complexArray, 1, halfN);
// free ham buffer
free(hamBuffer);
// run FFT
vDSP_fft_zrip(setupReal, &complexArray, stride, log2n, FFT_FORWARD);
// Absolute square (equivalent to mag^2)
complexArray.imagp[0] = 0.0f;
vDSP_zvmags(&complexArray, 1, complexArray.realp, 1, halfN);
bzero(complexArray.imagp, (halfN) * sizeof(float));
// scale
float scale = 1.0f / (2.0f*(float)n);
vDSP_vsmul(complexArray.realp, 1, &scale, complexArray.realp, 1, halfN);
// get log of absolute values for passing to inverse FFT for cepstrum
float *logmag = malloc(sizeof(float)*halfN);
for (int i=0; i < halfN; i++)
logmag[i] = log10f(fabsf(complexArray.realp[i]));
// configure float array into acceptable input array format (interleaved)
vDSP_ctoz((COMPLEX*)logmag, 2, &complexArray, 1, halfN/2);
// create cepstrum
vDSP_fft_zrip(setupReal, &complexArray, stride, log2n-1, FFT_INVERSE);
// scale again
scale = (float) 1.0 / (2 * n);
vDSP_vsmul(complexArray.realp, 1, &scale, complexArray.realp, 1, halfN);
vDSP_vsmul(complexArray.imagp, 1, &scale, complexArray.imagp, 1, halfN);
//convert interleaved to real
float *displayData = malloc(sizeof(float)*n);
vDSP_ztoc(&complexArray, 1, (COMPLEX*)displayData, 2, halfN);
// print cepstrum to debug window
for (int i=0; i < halfN; i++)
printf("%f\r\n", displayData[i]);
以下是前几个箱子的结果:
-1.036735
0.807992
-0.030310
0.201064
-0.048442
0.071084
-0.050529
0.108412
-0.037282
0.080372
-0.003775
0.102596
-0.027706
0.044470
0.010319
0.041597
-0.050533
0.012725
-0.003895
-0.016887
-0.010547
他们确实'稳定'朝零,但前几个数字比我预期的无声部分大。我的代码看起来不正确吗?我想我非常仔细地阅读了这篇文章。为什么我在清洁部分的前几个箱中得到如此大的值?