Question

我试图确定一下我的电脑工作方式，我有一台双核PC，我试图用我编写的代码测试它，当每个线程处理时，程序使用线程（在java中）乘以两个矩阵矩阵行除以线程数。因此，在两个1024 X 1024矩阵上测试我的代码我得到了这个结果 - （所有结果都是10次运行的中位数） 1个线程 - 9.878秒\\\ 2个主题 - 5.944秒\\\ 3个主题 - 5.062秒\\\ 4个主题 - 4.895秒\\\ 5到1024个线程，时间在4.8到5.3秒之间变化\\\。

我想弄清楚的是前4个线程中每个线程的时间下降的原因是什么？不应该在核心之间平均分配工作吗？所以1个线程10秒2个线程5秒，剩下的只需要更长时间，因为我只有2个内核，添加更多线程只是创建了更多的上下文切换。

我无法理解的第二件事，假设在第4个线程之后，我的PC只是在线程之间切换，而这些线程并没有真正分割工作，而只是切换哪个线程做某项工作，不应该是因为我做了很多上下文切换，所以1024个线程的时间急剧增加？

事先感谢对此事的任何回应

添加代码 -

    /**
 * A class representing matrix multiplying threads , implements runnable
 * used to test the time difference according to changes in amount of 
 * threads used in the program !
 * 
 * @author R.G
 */
public class MatrixMultThread implements Runnable{

    //Thread fields and constants
    private static final String INT_ERROR = "An error has occured during thread join";
    private static final String THREAD_COUNT_ERROR = "Invalid number of threads";
    static final int MATRIX_ROW = 1024;
    static final int MATRIX_COL = 1024;
    static final int UPPER_THREAD_LIMIT = MATRIX_ROW;
    private int startRow;
    private int endRow;
    private float[][] target;
    private float[][] sourceTwo;
    private float[][] sourceOne;

    /**
     * MatrixMultThread constructor - constructs the threads that handle multiplication.
     * 
     * @param startRow - the row this thread should start calculating from
     * @param endRow - the row this thread should stop calculating at (included in calc)
     * @param sourceOne - first matrix in the multiplication
     * @param sourceTwo - second matrix in the multiplication
     * @param target - result matrix
     */
    public MatrixMultThread(int startRow, int endRow, float[][] sourceOne, float[][] sourceTwo, float[][] target){
        this.startRow = startRow;
        this.endRow = endRow;
        this.target = target;
        this.sourceOne = sourceOne;
        this.sourceTwo = sourceTwo;

    }

    /**
     * Thread run method, invoking the actual calculation regarding
     * this thread's rows.
     */
    public void run() {
        int sum = 0;
        for(; startRow <= endRow; startRow++){
            for(int j = 0; j < MATRIX_COL ; j++){
                for(int i = 0; i < MATRIX_ROW ; i++){
                    sum += sourceOne[startRow][i] * sourceTwo[i][j];
                }
                target[startRow][j] = sum;
                sum = 0;
            }
        }
    }

    /**
     * A method used for multiplying two matrices by threads.
     * 
     * @param a - first source matrix
     * @param b - second source matrix
     * @param threadCount - number of threads to use in the multiplication
     */
    public static float[][] mult(float[][] a, float[][]b, int threadCount) {
        if(threadCount > UPPER_THREAD_LIMIT || threadCount < 1){
            System.out.println(THREAD_COUNT_ERROR);
            System.exit(1);
        }

        //Result matrix
        float[][] result = new float[MATRIX_ROW][MATRIX_COL];
        Thread[] threadList = new Thread[threadCount];

        //Creating the threads
        int firstRow = 0;
        int lastRow = 0;
        for (int i = 0; i < threadCount ; i++){
            firstRow = i * (MATRIX_ROW / threadCount);
            lastRow = ((i + 1) * (MATRIX_ROW / threadCount)) -1 ;
            Thread singleThread;

            //in case the number does not divide exactly we let the last thread do a bit extra work
            //to compensate on the missing few matrix lines.
            if((i + 1) == threadCount){
                 singleThread = new Thread(new MatrixMultThread(firstRow, MATRIX_ROW - 1, a, b, result));
            }else{
                 singleThread = new Thread(new MatrixMultThread(firstRow, lastRow, a, b, result));
            }
            threadList[i] = singleThread; 
            singleThread.start();
        }

        //Join loop
        for (int i = 0; i < threadCount ; i++){
            try {
                threadList[i].join();
            } catch (InterruptedException e) {
                System.out.println(INT_ERROR);
                System.exit(1);
            }
        }
        return result;
    }


    /**
     * Main method of multiplying two matrices using various number of threads
     * functionality time is being tested.
     * 
     * @param args.
     */
    public static void main(String[] args) {

        //Thread number and timers for milliseconds calculation.
        int numberOfThreads = 1024;
        long startTimer, endTimer;

        //Initializing matrices
        float[][] a = new float[MATRIX_ROW][MATRIX_COL];
        float[][] b = new float[MATRIX_ROW][MATRIX_COL];

        for(int i = 0 ; i < MATRIX_ROW ; i++){
            for(int j = 0 ; j < MATRIX_COL ; j++){
                a[i][j] = (float)(Math.random() * ((100 - 0) + 1)); //Random matrices (values
                b[i][j] = (float)(Math.random() * ((100 - 0) + 1)); //between 0 and 100).
            }
        }

        //Timing the multiplication.
        startTimer = System.currentTimeMillis();
        mult(a, b, numberOfThreads);
        endTimer = System.currentTimeMillis();
        System.out.println("Matrices multiplied in " + (endTimer - startTimer) + " miliseconds");
    }
}

Answer 1

您的程序受CPU限制。这意味着它消耗了整个调度程序量子。因此上下文切换开销相对较小：

overhead = ((consumed_quanta + context_switch_time) / consumed_quanta) - 1

在自愿离开CPU的进程中，上下文切换开销会更大：例如两个线程不断地在它们之间传递相同的消息（因此一个线程发送消息而另一个线程读取它，然后第二个线程将消息发送到第一个，依此类推）将具有非常高的上下文切换开销。

SMT（x86版本中的超线程）允许单个核心为多个线程提供服务，就好像它是几个逻辑核心一样。由于CPU经常需要等待外部资源（例如：它需要来自缓存的数据），在这些死区期间允许另一个线程继续运行可以通过相对较少的额外电路（与添加另一个核心相比）来提高性能。由于HT引起的实际系统（不是合成基准测试）中性能改进的典型引用数字约为10-20％，但YMMV：HT在某些边缘情况下会使性能变差，并且在不同边缘可能会有更显着的改善-cases。

影响多线程OS的核心数量

1 个答案: