为什么在使用MPJ Express的程序中发生异常?

时间:2016-02-14 00:26:40

标签: java mpi multicore mpj-express

有一个程序使用MPJ Express将矩阵和向量相乘。矩阵按行划分。但是处理时会发生异常。我做错了吗?

import java.util.Random;

import mpi.Comm;
import mpi.MPI;

public class Main {
    private static final int rootProcessorRank = 0;
    private static Comm comunicator;
    private static int processorsNumber;
    private static int currentProcessorRank;


    public static void main(String[] args) {
        MPI.Init(args);
        comunicator = MPI.COMM_WORLD;
        currentProcessorRank = comunicator.Rank();
        processorsNumber = comunicator.Size();

        if (currentProcessorRank == rootProcessorRank) {
            rootProcessorAction();
        } else {
            notRootProcessorAction();
        }

        MPI.Finalize();

    }

    public static void rootProcessorAction() {
        int[] matrixVectorSize = new int[] {5};
        int[][] matrix = createAndInitMatrix(matrixVectorSize[0]);
        int[] vector = createAndInitVector(matrixVectorSize[0]);

        for (int i = 1; i < processorsNumber; i++) {
            comunicator.Isend(matrixVectorSize, 0, 1, MPI.INT, i, MPI.ANY_TAG);
            System.out.println("Proc: " + currentProcessorRank + ", send matrixVectorSize");

            comunicator.Isend(vector, 0, vector.length, MPI.INT, i, MPI.ANY_TAG);
            System.out.println("Proc: " + currentProcessorRank + ", send vector");
        }

        int averageRowsPerProcessor = matrix.length / (processorsNumber - 1);
        int[] rowsPerProcessor = new int[processorsNumber];
        int notDistributedRowsNumber = matrix.length;
        for (int i = 1; i < rowsPerProcessor.length; i++) {
            if (i == rowsPerProcessor.length - 1) {
                rowsPerProcessor[i] = notDistributedRowsNumber;
            } else {
                rowsPerProcessor[i] = averageRowsPerProcessor;
                notDistributedRowsNumber -= averageRowsPerProcessor;
            }
        }

        int offset = 0;
        // the processorRows[0] always will be '0'
        for (int i = 1; i < rowsPerProcessor.length; i++) {
            int[] processorRows = new int[1];
            processorRows[0] = rowsPerProcessor[i];
            comunicator.Isend(processorRows, 0, 1, MPI.INT, i, MPI.ANY_TAG);
            comunicator.Isend(matrix, offset, processorRows[0], MPI.OBJECT, i, MPI.ANY_TAG);
            offset += rowsPerProcessor[i];
        }

        // there will be a code that receive a subRecults from all processes.
    }

    public static void notRootProcessorAction() {
        int[] matrixVectorSize = new int[1];
        int[] rowsNumber = new int[1];
        int[] vector = null;
        int[][] subMatrix = null;

        comunicator.Probe(rootProcessorRank, MPI.ANY_SOURCE);
        comunicator.Recv(matrixVectorSize, 0, 1, MPI.INT, rootProcessorRank, MPI.ANY_TAG);
        System.out.println("Proc: " + currentProcessorRank + ", receive matrixVectorSize");

        vector = new int[matrixVectorSize[0]];
        comunicator.Probe(rootProcessorRank, MPI.ANY_SOURCE);
        comunicator.Recv(vector, 0, vector.length, MPI.INT, rootProcessorRank, MPI.ANY_TAG);
        System.out.println("Proc: " + currentProcessorRank + ", receive vector");

        comunicator.Probe(rootProcessorRank, MPI.ANY_SOURCE);
        comunicator.Recv(rowsNumber, 0, 1, MPI.INT, rootProcessorRank, MPI.ANY_TAG);
        System.out.println("Proc: " + currentProcessorRank + ", receive rowsNumber");
        subMatrix = new int[rowsNumber[0]][rowsNumber[0]];

        comunicator.Probe(rootProcessorRank, MPI.ANY_SOURCE);
        comunicator.Recv(subMatrix, 0, subMatrix.length, MPI.OBJECT, rootProcessorRank, MPI.ANY_TAG);
        System.out.println("Proc: " + currentProcessorRank + ", receive subMatrix");

        int[] result = new int[rowsNumber[0]];
        multiplyMatrixVector(subMatrix, vector, result);

        comunicator.Send(result, 0, result.length, MPI.INT, rootProcessorRank, MPI.ANY_TAG);
    }

    private static void multiplyMatrixVector(int[][] matrix, int[] vector, int[] result) {
        for (int i = 0; i < matrix.length; i++) {
            int summ = 0;
            for (int j = 0; j < matrix[i].length; j++) {
                summ += matrix[i][j] * vector[j];
            }
            result[i] = summ;
        }
    }

    private static int[][] createAndInitMatrix(int size) {
        int[][] matrix = new int[size][size];
        Random random = new Random();
        for (int i = 0; i < matrix.length; i++) {
            for (int j = 0; j < matrix.length; j++) {
                matrix[i][j] = random.nextInt(100);
            }
        }
        return matrix;
    }

    private static int[] createAndInitVector(int size) {
        int[] vector = new int[size];
        Random random = new Random();
        for (int i = 0; i < vector.length; i++) {
            vector[i] = random.nextInt(100);
        }
        return vector;
    }
}

这是一个例外:

  

MPJ Express(0.44)以多核配置启动   java.lang.reflect.InvocationTargetException at   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:606)at   runtime.starter.MulticoreStarter $ 1.run(MulticoreStarter.java:281)at at   java.lang.Thread.run(Thread.java:745)引起:mpi.MPIException:   xdev.XDevException:java.lang.NullPointerException at   mpi.Comm.isend(Comm.java:944),mpi.Comm.Isend(Comm.java:885)at at   Main.main上的Main.rootProcessorAction(Main.java:35)(Main.java:20)     ... 6更多引起:xdev.XDevException:   java.lang.NullPointerException at   xdev.smpdev.SMPDevice.isend(SMPDevice.java:104)at   mpjdev.javampjdev.Comm.isend(Comm.java:1019)at   mpi.Comm.isend(Comm.java:941)... 9更多引起:   java.lang.NullPointerException at   xdev.smpdev.SMPDeviceImpl $ SendQueue.add(SMPDeviceImpl.java:930)at at   xdev.smpdev.SMPDeviceImpl $ SendQueue.add(SMPDeviceImpl.java:909)at at   xdev.smpdev.SMPDeviceImpl.isend(SMPDeviceImpl.java:330)at at   xdev.smpdev.SMPDevice.isend(SMPDevice.java:101)......还有11个   xdev.XDevException:java.lang.NullPointerException at   xdev.smpdev.SMPDevice.recv(SMPDevice.java:162)

1 个答案:

答案 0 :(得分:1)

In my experience with mpj express, try to avoid using the constants MPI.ANY_SOURCE and MPI.ANY_TAG. Set your own tag and the source and you should be fine. When i was using this constants in my program, sometimes i got random crashes with the xDev.xDevException caused by a null pointer and sometimes it run just fine.

Here a list of the internal constants of the mpj express you should not use as tag aswell, i am showing only the constants that are integers:

public static final int mpi.MPI.NUM_OF_PROCESSORS = 4
public static int mpi.MPI.UNDEFINED = -1
public static int mpi.MPI.THREAD_SINGLE = 1
public static int mpi.MPI.THREAD_FUNNELED = 2
public static int mpi.MPI.THREAD_SERIALIZED = 3
public static int mpi.MPI.THREAD_MULTIPLE = 4
public static int mpi.MPI.ANY_SOURCE = -2
public static int mpi.MPI.ANY_TAG = -2
public static int mpi.MPI.PROC_NULL = -3
public static int mpi.MPI.BSEND_OVERHEAD = 0
public static int mpi.MPI.SEND_OVERHEAD = 0
public static int mpi.MPI.RECV_OVERHEAD = 0
public static final int mpi.MPI.IDENT = 0
public static final int mpi.MPI.CONGRUENT = 3
public static final int mpi.MPI.SIMILAR = 1
public static final int mpi.MPI.UNEQUAL = 2
public static int mpi.MPI.GRAPH = 1
public static int mpi.MPI.CART = 2
public static int mpi.MPI.TAG_UB = 0
public static int mpi.MPI.HOST = 0
public static int mpi.MPI.IO = 0

cheers.