我想优化这个算法。函数makeFrame
使用大约37毫秒的汉宁窗口将音频信号分成时间帧。然后函数divideFreqs
使用jtransforms库在每个时间帧上执行快速傅立叶变换(并且它是最耗时的那个)。我怎么能减少这个操作的时间,因为这需要太长时间。对于5秒的音频文件,执行操作大约需要13秒。我在考虑使用多线程但从未使用过它。
public double[][] makeFrame(double[] audioOutput) {
int length = audioOutput.length;
//calculate a hannining window size of 37 ms
int window = (int) Math.round(0.37 * sampleRate);
int interval = (int) Math.round(0.0116 * sampleRate);
length = length - window;
int numintervals = length / interval;
//calculate hanning window values
double[] hanw = hanning(window);
double[][] sections = new double[numintervals + 1][25];
//divide the signal into timeframes using Hanning window of 37ms
int k = 0;
for (int i = 0; i < length; i += interval) {
double[] temp = new double[88200];
int t = 0;
int s;
s = i;
for (; s < i + window; s++) {
temp[t] = audioOutput[s] * hanw[t];
t++;
}
sections[k] = divideFreqs(temp, sampleRate);
k++;
}
return sections;
}
public static double[] hanning(int window) {
int w = 0;
double h_wnd[] = new double[window]; //Hanning window
for (int i = 1; i < window; i++) { //calculate the hanning window
h_wnd[i] = 0.5 * (1 - Math.cos(2.0 * Math.PI * i / (window + 1)));
}
return h_wnd;
}
public static double[] divideFreqs(double[] audioData, float fs) {
DoubleFFT_1D fft = new DoubleFFT_1D(44100);
int len;
double[] secenergy;
//Frequency bands in the range of 1Hz-20000Hz
int[][] bandsec = new int[][]{
{1, 100},
{100, 200},
{200, 300},
{300, 400},
{400, 510},
{510, 630},
{630, 770},
{770, 920},
{920, 1080},
{1080, 1270},
{1270, 1480},
{1480, 1720},
{1720, 2000},
{2000, 2320},
{2320, 2700},
{2700, 3150},
{3150, 3700},
{3700, 4400},
{4400, 5300},
{5300, 6400},
{6400, 7700},
{7700, 9500},
{9500, 12000},
{12000, 15500},
{15500, 20000}};
//perform FFT on the data
fft.realForwardFull(audioData);
//splitting real and imaginary numbers
double[] real = new double[22050];
double[] imaginary = new double[22050];
for (int row = 0; row < 22050; row++) {
real[row] = (double) Math.round(audioData[row + row] * 100000000) / 100000000;
imaginary[row] = (double) Math.round(audioData[row + row + 1] * 100000000) / 100000000;
}
len = bandsec.length;
secenergy = new double[len];
//calculate energy for each critical band
double[] tempReal;
double[] tempImag;
for (int i = 0; i < len; i++) {
int k = 0;
tempReal = new double[bandsec[i][1] - (bandsec[i][0] - 1)];
tempImag = new double[bandsec[i][1] - (bandsec[i][0] - 1)];
for (int j = bandsec[i][0] - 1; j < bandsec[i][1]; j++) {
tempReal[k] = real[j];
tempImag[k] = imaginary[j];
k++;
}
secenergy[i] = energy(tempReal, tempImag);
}
return secenergy;
}
public static double energy(double[] real, double[] imaginary) {
double e = 0;
Complex sum = new Complex(0, 0);
ArrayList<Complex> complexList = new ArrayList<Complex>();
for (int i = 0; i < real.length; i++) {
Complex comp = new Complex(real[i], imaginary[i]);
complexList.add(comp.multiply(comp));
}
for (int i = 0; i < complexList.size(); i++) {
Complex comp = new Complex(complexList.get(i).getReal(), complexList.get(i).getImaginary());
sum = Complex.add(comp, sum);
}
e = Math.sqrt(sum.magnitude());
e = (double) Math.round(e * 10000) / 10000;
return e;
}
答案 0 :(得分:3)
使用多个内核会有所帮助,但通常会优化代码,从而为您带来更多好处。
使用double而不是Complex对象在我的机器上快9倍。
The average time using double took 38,687 ns
The average time using Complex took 344,010 ns
测试代码
public class EnergyTest {
public static void main(String... args) {
double[] real = new double[22050];
double[] imaginary = new double[22050];
for (int i = 0; i < real.length; i++) {
real[i] = Math.random() - Math.random();
imaginary[i] = Math.random() - Math.random();
}
{
int runs = 100000;
long start = 0;
double e = 0;
for (int i = -10000; i < runs; i++) {
if (i == 0) start = System.nanoTime();
e += energyDouble(real, imaginary);
}
assert e > 0;
long time = System.nanoTime() - start;
System.out.printf("The average time using double took %,d ns%n", time / runs);
}
{
int runs = 10000;
long start = System.nanoTime();
double e = 0;
for (int i = -10000; i < runs; i++) {
if (i == 0) start = System.nanoTime();
e += energy(real, imaginary);
}
assert e > 0;
long time = System.nanoTime() - start;
System.out.printf("The average time using Complex took %,d ns%n", time / runs);
}
}
public static double energyDouble(double[] real, double[] imaginary) {
double re_total = 0, im_total = 0;
for (int i = 0; i < real.length; i++) {
double re = real[i];
double im = imaginary[i];
double re2 = re * re - im * im;
double im2 = 2 * re * im;
re_total += re2;
im_total += im2;
}
double e = Math.sqrt(re_total * re_total + im_total * im_total);
e = (double) Math.round(e * 10000) / 10000;
return e;
}
public static double energy(double[] real, double[] imaginary) {
double e = 0;
Complex sum = new Complex(0, 0);
ArrayList<Complex> complexList = new ArrayList<Complex>();
for (int i = 0; i < real.length; i++) {
Complex comp = new Complex(real[i], imaginary[i]);
complexList.add(comp.multiply(comp));
}
for (int i = 0; i < complexList.size(); i++) {
Complex comp = new Complex(complexList.get(i).getReal(), complexList.get(i).getImaginary());
sum = Complex.add(comp, sum);
}
e = Math.sqrt(sum.magnitude());
e = (double) Math.round(e * 10000) / 10000;
return e;
}
static class Complex {
private final double re;
private final double im;
public Complex(double re, double im) {
this.re = re;
this.im = im;
}
public double getReal() {
return re;
}
public double getImaginary() {
return im;
}
public Complex multiply(Complex comp) {
double re2 = re * comp.re - im * comp.im;
double im2 = im * comp.re + re * comp.im;
return new Complex(re2, im2);
}
public static Complex add(Complex a, Complex b) {
return new Complex(a.re + b.re, a.im + b.im);
}
public double magnitude() {
return re * re + im * im;
}
}
}
答案 1 :(得分:2)
因为我自己使用FFTW,所以我不知道你的图书馆,但我注意到的是
1) your fft size is not a power of 2 2) your window is 370ms not 37ms. 3) Since your window has a size of 370ms (i.e. ~16k samples) why feed a 88200 (or does the constructor value say "take only 44100 values"?) array into it? It is fully sufficient to take pow(2.0, ceil(log2(0.37*44100))) = 2^14 = 16384 as your fft size. Zero padding wont add additional frequency resolution I'm afraid. 4) you instatiate a new FFT object for every call to divideFreq. I'm not sure how expensive the construction is, so try make it a class member. 5) Last but not least (I think this is the major speed loss here) Your hop size is much too small! A common overlap is 1/2 or 2/3 of the window size (in terms of your code: interval = windowSize/3). Your's is around 1/31 of the window size. Thats really overkill give you many redundant results.
欢呼声
答案 2 :(得分:1)
一些想法:
在makeFrame中,在循环外声明double[] temp = new double[88200]
,以避免重复的内存分配和释放。 (我不知道优化器是否足够聪明,可以自行完成)
在divideFreqs
中,Math.round(data*100000000)/100000000
的目的是什么?你是限制在9位有效数字?第10位真的对结果有如此大的影响吗?如果不严格需要此步骤,则可以节省一些时间。
我能看到的最大节省来自于消除整个构造并复制到real
和imaginary
数组中。您的FFT似乎会留下audioData
22050个交错的实/虚部分。因此,您可以将能量函数的签名更改为energy(double fftData[], int lowFreq, int highFreq)
,并在该函数中将以下两个循环替换为:
for (int i = 0; i <highFreq-lowFreq; i++) {
Complex comp = new Complex(fftData[2*i], fftData[2*i+1]);
sum= Complex.add(sum, comp.multiply(comp))
}
然后简单地替换divideFreqs
的正文:
//...
fft.realForwardFull(audioData); //splitting real and imaginary numbers
len = bandsec.length;
secenergy = new double[len];
for (int i = 0; i < len; i++) {
secenergy[i] = energy(audioData, bandSec[i][0], bandSec[i][1]-1)
}
return secenergy
这避免了大量的内存分配,复制和释放,并且应该大大提高速度 在这些更改之后,为了更好的执行,您可以使用@Peter显示的直接双重操作替换复数操作。
答案 3 :(得分:1)
要使FFT实际工作,您必须具有2的幂作为样本数。这允许FFT在O(n * log(n))中运行而不是进行O(n * n)的DFT。如果您提供正确大小的输入,您的库可能足够聪明,可以进行此改进。不幸的是,这意味着您的窗户尺寸不适合您,因为它的尺寸仅限于2,4,8,16等......
从技术上讲,您可以使用FFT后面的数学来加速变换,其中样本数不是素数,但这非常复杂,不是那么快,并且可能不受库支持。但是,如果您的窗口大小需要一定的大小以匹配规范,那么您可能会被迫使用更复杂的东西来获得性能。