Question

我已经阅读了类似的问题，其中问题是使用同步方法（如Math.random（）），或者工作太少而不能证明开销是合理的，但我不认为这是这种情况。

我的处理器有4个物理/ 8个逻辑核心。在一次预热后，我测试下面的代码，其中n = 1; 2; 3; 4; 8在11x11矩阵上;

ExecutorService pool = Executors.newFixedThreadPool(n);
long startTime = System.currentTimeMillis();
double result = pool.submit(new Solver(pool, matrix)).get();
System.out.println(result);
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println(elapsedTime);

执行需要各自：

1~15500 2~13500 - 14000 3~14300 - 15500 4~14500 - 19000 8~19000 - 23000

所以我得到了用2来增加一点，几乎没有提升3，有时几乎没有提升，但有时候极度放缓4 用8完成减速;

以下是代码：

import java.util.ArrayList;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadPoolExecutor;

public class Solver implements Callable<Double> {

    private ExecutorService pool;
    private double[][] matrix;


    public Solver(ExecutorService pool, double[][] matrix){
        this.pool = pool;
        this.matrix = matrix;
    }

    public double determinant(double[][] matrix) {
        if (matrix.length == 1)
            return (matrix[0][0]);

        double coefficient;
        double sum = 0;
        int threadsCount = ((ThreadPoolExecutor) pool).getMaximumPoolSize();
        ArrayList<Double> coefficients = new ArrayList<Double>();
        ArrayList<Future<Double>> delayedDeterminants = new ArrayList<Future<Double>>();

        for (int k = 0; k < matrix.length; k++) {
            double[][] smaller = new double[matrix.length - 1][matrix.length - 1];
            for (int i = 1; i < matrix.length; i++) {
                for (int j = 0; j < matrix.length; j++) {
                    if (j < k)
                        smaller[i - 1][j] = matrix[i][j];
                    else if (j > k)
                        smaller[i - 1][j - 1] = matrix[i][j];
                }
            }
            coefficient = ((k % 2 == 0) ? 1 : -1) * matrix[0][k];
            if (((ThreadPoolExecutor) pool).getActiveCount() < threadsCount
                    && matrix.length > 5) {
                coefficients.add(coefficient);
                delayedDeterminants.add(pool.submit(new Solver(pool, smaller)));
            } else
                sum += coefficient * (determinant(smaller));
        }

        try {
            for (int i = 0; i < coefficients.size(); i++)
                sum += coefficients.get(i) * delayedDeterminants.get(i).get();
        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
        }

        return (sum);
    }

    @Override
    public Double call() throws Exception {
        return determinant(matrix);
    }

}

Answer 1

处理这种分而治之算法的一般方法是在单个线程中下降到某个级别，直到有足够的独立任务，然后在ExecutorService上安排所有这些任务，让更多级别的递归在同样的任务。例如，在一个级别下，您有121个子矩阵来计算行列式，因此向ExecutorService提交121个任务。或者再去一个级别来获得12,100个子问题并提交所有这些。

轮询 ExecutorService 以查找活动任务计数可能不是最佳选择。创建一个newFixedThreadPool(4)或要测试的任意数量的线程，让 ExecutorService 管理任务执行。如果您要完成的工作是偷窃，那么我会热烈建议您花一些时间熟悉自动管理工作窃取的Fork / Join框架。它旨在完全处理您的任务。

另一件事，与问题没有直接关系：你应该重新设计代码，因此只有一个2D数组用于所有计算。一维阵列会更好。

ExecutorService没有性能增益递归决定因素Java

1 个答案: