Question

我想优化这个算法。函数makeFrame使用大约37毫秒的汉宁窗口将音频信号分成时间帧。然后函数divideFreqs使用jtransforms库在每个时间帧上执行快速傅立叶变换（并且它是最耗时的那个）。我怎么能减少这个操作的时间，因为这需要太长时间。对于5秒的音频文件，执行操作大约需要13秒。我在考虑使用多线程但从未使用过它。

 public double[][] makeFrame(double[] audioOutput) {

            int length = audioOutput.length;

            //calculate a hannining window size of 37 ms
            int window = (int) Math.round(0.37 * sampleRate);
            int interval = (int) Math.round(0.0116 * sampleRate);
            length = length - window;
            int numintervals = length / interval;
            //calculate hanning window values
            double[] hanw = hanning(window);
            double[][] sections = new double[numintervals + 1][25];


            //divide the signal into timeframes using Hanning window of 37ms
            int k = 0;
            for (int i = 0; i < length; i += interval) {
                double[] temp = new double[88200];
                int t = 0;
                int s;

                s = i;

                for (; s < i + window; s++) {
                    temp[t] = audioOutput[s] * hanw[t];
                    t++;
                }
                sections[k] = divideFreqs(temp, sampleRate);
                k++;
            }

            return sections;
        }


public static double[] hanning(int window) {

int w = 0;

        double h_wnd[] = new double[window]; //Hanning window

        for (int i = 1; i < window; i++) { //calculate the hanning window
            h_wnd[i] = 0.5 * (1 - Math.cos(2.0 * Math.PI * i / (window + 1)));
        }

        return h_wnd;
    }

 public static double[] divideFreqs(double[] audioData, float fs) {

        DoubleFFT_1D fft = new DoubleFFT_1D(44100);
        int len;
        double[] secenergy;


        //Frequency bands in the range of 1Hz-20000Hz
        int[][] bandsec = new int[][]{
            {1, 100},
            {100, 200},
            {200, 300},
            {300, 400},
            {400, 510},
            {510, 630},
            {630, 770},
            {770, 920},
            {920, 1080},
            {1080, 1270},
            {1270, 1480},
            {1480, 1720},
            {1720, 2000},
            {2000, 2320},
            {2320, 2700},
            {2700, 3150},
            {3150, 3700},
            {3700, 4400},
            {4400, 5300},
            {5300, 6400},
            {6400, 7700},
            {7700, 9500},
            {9500, 12000},
            {12000, 15500},
            {15500, 20000}};


        //perform FFT on the data
        fft.realForwardFull(audioData);


        //splitting real and imaginary numbers
        double[] real = new double[22050];
        double[] imaginary = new double[22050];
        for (int row = 0; row < 22050; row++) {
            real[row] = (double) Math.round(audioData[row + row] * 100000000) / 100000000;
            imaginary[row] = (double) Math.round(audioData[row + row + 1] * 100000000) / 100000000;

        }

        len = bandsec.length;
        secenergy = new double[len];

        //calculate energy for each critical band
        double[] tempReal;
        double[] tempImag;
        for (int i = 0; i < len; i++) {
            int k = 0;
            tempReal = new double[bandsec[i][1] - (bandsec[i][0] - 1)];
            tempImag = new double[bandsec[i][1] - (bandsec[i][0] - 1)];

            for (int j = bandsec[i][0] - 1; j < bandsec[i][1]; j++) {

                tempReal[k] = real[j];
                tempImag[k] = imaginary[j];
                k++;
            }
            secenergy[i] = energy(tempReal, tempImag);

        }

        return secenergy;
    }

 public static double energy(double[] real, double[] imaginary) {
        double e = 0;

        Complex sum = new Complex(0, 0);
        ArrayList<Complex> complexList = new ArrayList<Complex>();

        for (int i = 0; i < real.length; i++) {
            Complex comp = new Complex(real[i], imaginary[i]);

            complexList.add(comp.multiply(comp));
        }

        for (int i = 0; i < complexList.size(); i++) {
            Complex comp = new Complex(complexList.get(i).getReal(), complexList.get(i).getImaginary());

            sum = Complex.add(comp, sum);


        }

        e = Math.sqrt(sum.magnitude());
        e = (double) Math.round(e * 10000) / 10000;
        return e;
    }

Answer 1

使用多个内核会有所帮助，但通常会优化代码，从而为您带来更多好处。

使用double而不是Complex对象在我的机器上快9倍。

The average time using double took 38,687 ns
The average time using Complex took 344,010 ns

测试代码

public class EnergyTest {
    public static void main(String... args) {
        double[] real = new double[22050];
        double[] imaginary = new double[22050];
        for (int i = 0; i < real.length; i++) {
            real[i] = Math.random() - Math.random();
            imaginary[i] = Math.random() - Math.random();
        }
        {
            int runs = 100000;
            long start = 0;
            double e = 0;
            for (int i = -10000; i < runs; i++) {
                if (i == 0) start = System.nanoTime();
                e += energyDouble(real, imaginary);
            }
            assert e > 0;
            long time = System.nanoTime() - start;
            System.out.printf("The average time using double took %,d ns%n", time / runs);
        }
        {
            int runs = 10000;
            long start = System.nanoTime();
            double e = 0;
            for (int i = -10000; i < runs; i++) {
                if (i == 0) start = System.nanoTime();
                e += energy(real, imaginary);
            }
            assert e > 0;
            long time = System.nanoTime() - start;
            System.out.printf("The average time using Complex took %,d ns%n", time / runs);
        }
    }

    public static double energyDouble(double[] real, double[] imaginary) {
        double re_total = 0, im_total = 0;

        for (int i = 0; i < real.length; i++) {
            double re = real[i];
            double im = imaginary[i];
            double re2 = re * re - im * im;
            double im2 = 2 * re * im;
            re_total += re2;
            im_total += im2;
        }
        double e = Math.sqrt(re_total * re_total + im_total * im_total);
        e = (double) Math.round(e * 10000) / 10000;
        return e;
    }

    public static double energy(double[] real, double[] imaginary) {
        double e = 0;

        Complex sum = new Complex(0, 0);
        ArrayList<Complex> complexList = new ArrayList<Complex>();

        for (int i = 0; i < real.length; i++) {
            Complex comp = new Complex(real[i], imaginary[i]);

            complexList.add(comp.multiply(comp));
        }

        for (int i = 0; i < complexList.size(); i++) {
            Complex comp = new Complex(complexList.get(i).getReal(), complexList.get(i).getImaginary());

            sum = Complex.add(comp, sum);


        }

        e = Math.sqrt(sum.magnitude());
        e = (double) Math.round(e * 10000) / 10000;
        return e;
    }

    static class Complex {

        private final double re;
        private final double im;

        public Complex(double re, double im) {
            this.re = re;
            this.im = im;
        }

        public double getReal() {
            return re;
        }

        public double getImaginary() {
            return im;
        }

        public Complex multiply(Complex comp) {
            double re2 = re * comp.re - im * comp.im;
            double im2 = im * comp.re + re * comp.im;
            return new Complex(re2, im2);
        }

        public static Complex add(Complex a, Complex b) {
            return new Complex(a.re + b.re, a.im + b.im);
        }

        public double magnitude() {
            return re * re + im * im;
        }
    }
}

Answer 2

因为我自己使用FFTW，所以我不知道你的图书馆，但我注意到的是

1) your fft size is not a power of 2

2) your window is 370ms not 37ms.

3) Since your window has a size of 370ms (i.e. ~16k samples) why feed a 88200 
(or does the constructor value say "take only 44100 values"?) array into it? 
It is fully sufficient to take 
    pow(2.0, ceil(log2(0.37*44100))) = 2^14 = 16384
as your fft size. 
Zero padding wont add additional frequency resolution I'm afraid.

4) you instatiate a new FFT object for every call to divideFreq.
I'm not sure how expensive the construction is, so try make it a class member.

5) Last but not least (I think this is the major speed loss here) 
Your hop size is much too small! 
A common overlap is 1/2 or 2/3 of the window size 
(in terms of your code: interval = windowSize/3). 
Your's is around 1/31 of the window size. 
Thats really overkill give you many redundant results.

欢呼声

Answer 3

一些想法：

在makeFrame中，在循环外声明double[] temp = new double[88200]，以避免重复的内存分配和释放。 （我不知道优化器是否足够聪明，可以自行完成）

在divideFreqs中，Math.round(data*100000000)/100000000的目的是什么？你是限制在9位有效数字？第10位真的对结果有如此大的影响吗？如果不严格需要此步骤，则可以节省一些时间。

我能看到的最大节省来自于消除整个构造并复制到real和imaginary数组中。您的FFT似乎会留下audioData 22050个交错的实/虚部分。因此，您可以将能量函数的签名更改为energy(double fftData[], int lowFreq, int highFreq)，并在该函数中将以下两个循环替换为：

      for (int i = 0; i <highFreq-lowFreq; i++) {
             Complex comp = new Complex(fftData[2*i], fftData[2*i+1]);
             sum=  Complex.add(sum, comp.multiply(comp)) 
      }

然后简单地替换divideFreqs的正文：

//...
fft.realForwardFull(audioData); //splitting real and imaginary numbers

len = bandsec.length;
secenergy = new double[len];  
for (int i = 0; i < len; i++) {  
    secenergy[i] = energy(audioData, bandSec[i][0], bandSec[i][1]-1)
}
return secenergy

这避免了大量的内存分配，复制和释放，并且应该大大提高速度在这些更改之后，为了更好的执行，您可以使用@Peter显示的直接双重操作替换复数操作。

Answer 4

要使FFT实际工作，您必须具有2的幂作为样本数。这允许FFT在O（n * log（n））中运行而不是进行O（n * n）的DFT。如果您提供正确大小的输入，您的库可能足够聪明，可以进行此改进。不幸的是，这意味着您的窗户尺寸不适合您，因为它的尺寸仅限于2,4,8,16等......

从技术上讲，您可以使用FFT后面的数学来加速变换，其中样本数不是素数，但这非常复杂，不是那么快，并且可能不受库支持。但是，如果您的窗口大小需要一定的大小以匹配规范，那么您可能会被迫使用更复杂的东西来获得性能。

帮助进行算法优化

4 个答案: