Question

我用Java编写了一个（非常简单的）基准测试程序。它只是将double值递增到指定值并花费时间。

当我在我的6核桌面上使用这个单线程或少量线程（最多100个）时，基准测试会返回合理且可重复的结果。

但是当我使用例如1200个线程时，平均多核持续时间明显低于单个核心持续时间（大约10倍或更多）。无论我使用了多少线程，我都确保总增量是相同的。

为什么线程越多，性能下降如此之多？有没有办法解决这个问题？

我发布了我的来源，但我认为没有问题。

Benchmark.java：

package sibbo.benchmark;

import java.text.DecimalFormat;
import java.util.LinkedList;
import java.util.List;

public class Benchmark implements TestFinishedListener {
            private static final double TARGET = 1e10;
    private static final int THREAD_MULTIPLICATOR = 2;

    public static void main(String[] args) throws InterruptedException {
        Benchmark b = new Benchmark(TARGET);
        b.start();
    }

    private int coreCount;
    private List<Worker> workers = new LinkedList<>();
    private List<Worker> finishedWorkers = new LinkedList<>();
    private double target;

    public Benchmark(double target) {
        this.target = target;
        getSystemInfos();
        printInfos();
    }

    private void getSystemInfos() {
        coreCount = Runtime.getRuntime().availableProcessors();
    }

    private void printInfos() {
        System.out.println("Usable cores: " + coreCount);
        System.out.println("Multicore threads: " + coreCount *                 THREAD_MULTIPLICATOR);
        System.out.println("Loops per core: " + new DecimalFormat("###,###,###,###,##0").format(TARGET));

        System.out.println();
    }

    public synchronized void start() throws InterruptedException {
        Thread.currentThread().setPriority(Thread.MAX_PRIORITY);

        System.out.print("Initializing singlecore benchmark... ");
        Worker w = new Worker(this, 0);
        workers.add(w);

        Thread.sleep(1000);
        System.out.println("finished");

        System.out.print("Running singlecore benchmark... ");
        w.runBenchmark(target);
        wait();

        System.out.println("finished");
        printResult();

        System.out.println();
        // Multicore
        System.out.print("Initializing multicore benchmark...  ");
        finishedWorkers.clear();

        for (int i = 0; i < coreCount * THREAD_MULTIPLICATOR; i++) {
            workers.add(new Worker(this, i));
        }

        Thread.sleep(1000);
        System.out.println("finished");

        System.out.print("Running multicore benchmark...  ");

        for (Worker worker : workers) {
            worker.runBenchmark(target / THREAD_MULTIPLICATOR);
        }

        wait();

        System.out.println("finished");
        printResult();

        Thread.currentThread().setPriority(Thread.NORM_PRIORITY);
    }

    private void printResult() {
        DecimalFormat df = new DecimalFormat("###,###,###,##0.000");

        long min = -1, av = 0, max = -1;
        int threadCount = 0;
        boolean once = true;

        System.out.println("Result:");

        for (Worker w : finishedWorkers) {
            if (once) {
                once = false;

                min = w.getTime();
                max = w.getTime();
            }

            if (w.getTime() > max) {
                max = w.getTime();
            }

            if (w.getTime() < min) {
                min = w.getTime();
            }

            threadCount++;
            av += w.getTime();

            if (finishedWorkers.size() <= 6) {
                System.out.println("Worker " + w.getId() + ": " + df.format(w.getTime() / 1e9) + "s");
            }
        }

        System.out.println("Min: " + df.format(min / 1e9) + "s, Max: " + df.format(max / 1e9) + "s, Av per Thread: "
                + df.format((double) av / threadCount / 1e9) + "s");
    }

    @Override
    public synchronized void testFinished(Worker w) {
        workers.remove(w);
        finishedWorkers.add(w);

        if (workers.isEmpty()) {
            notify();
        }
    }
}

Worker.java：

package sibbo.benchmark;

public class Worker implements Runnable {
    private double value = 0;
    private long time;
    private double target;
    private TestFinishedListener l;
    private final int id;

    public Worker(TestFinishedListener l, int id) {
        this.l = l;
        this.id = id;

        new Thread(this).start();
    }

    public int getId() {
        return id;
    }

    public synchronized void runBenchmark(double target) {
        this.target = target;
        notify();
    }

    public long getTime() {
        return time;
    }

    @Override
    public void run() {
        synWait();
        value = 0;
        long startTime = System.nanoTime();

        while (value < target) {
            value++;
        }

        long endTime = System.nanoTime();
        time = endTime - startTime;

        l.testFinished(this);
    }

    private synchronized void synWait() {
        try {
            wait();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

Answer 1

您需要了解操作系统（或Java线程调度程序，或两者）正在尝试在应用程序中的所有线程之间取得平衡，以使它们有机会执行某些工作，并且存在非零成本在线程之间切换。使用1200个线程，您刚刚达到（并且可能远远超过）转换点，其中处理器花费的时间超过了实际工作时间。

这是一个粗略的比喻：

你在A房有一份工作要做。你每天在A房待8小时，然后完成你的工作。

然后你的老板过来告诉你，你必须在B室做一份工作。现在你需要定期离开A房间，沿着大厅走到B房间，然后走回去。每天步行需要1分钟。现在你花3个小时，每个工作59.5分钟，在房间之间走一分钟。

现在想象一下，你有1200个房间可供工作。你将花费更多时间在房间之间行走而不是做实际工作。这就是您将处理器放入的情况。它花了很多时间在上下文之间切换，没有真正的工作。

编辑：现在，根据下面的评论，也许你在继续前往每个房间花费一定的时间 - 你的工作将会进展，但是房间之间的上下文切换次数仍会影响单个任务的整体运行时间

Answer 2

好的，我想我已经找到了问题，但直到现在，还没有解决方案。

当测量每个线程运行以完成其部分工作的时间时，对于不同的线程总量，存在不同的可能最小值。每次最大值都相同。如果首先启动一个线程然后经常暂停并最后完成。例如，该最大值可以是10秒。假设每个线程完成的操作总量保持不变，无论我使用多少线程，当使用不同数量的线程时，必须更改单个线程完成的操作量。例如，使用一个线程，它必须执行1000次操作，但是使用十个线程，每个线程只需执行100次操作。现在，使用十个线程，一个线程可以使用的最短时间远远低于使用一个线程。因此，计算每个线程完成其工作所需的平均时间是无稽之谈。使用十个线程的最小值为1秒。如果一个线程在没有中断的情况下完成工作，就会发生这种情况。

修改

解决方案是简单地测量第一个线程开始和最后一个线程完成之间的时间量。

使用太多Threads基准程序的问题

2 个答案: