即使使用线程池,多线程的许多短期任务也会变慢

时间:2014-01-28 16:16:21

标签: java multithreading performance threadpool

背景

我目前有一个线性物理引擎(但这个问题不需要物理引擎知识),我正在试验多线程,希望提高效率。

一个这样的部分是宽相,在这种情况下,这涉及沿所有3个轴移动所有物体以检查哪个重叠(在所有轴上发生的任何物体被认为在宽相位中发生碰撞)。除了使用普通对象之外,3轴扫描完全独立,因此似乎是多线程的好地方。

为了避免在线程之间阻塞的可能性,这三个进程中的每一个都采用它想要使用的所有数据的本地副本(如果适用)进入多线程

虽然这些扫描是一个重要的瓶颈,但它们的寿命很短,扫描通常持续1-4毫秒。这是一个实时应用程序,其中代码每秒运行60次,因此总滴答时间最长为17ms,因此1-4ms对我来说是很长的时间。因为这些扫描是短暂的,我使用了一个线程池。特别是Executors.newFixedThreadPool(3),3轴为3。

我的测试计算机是具有超线程的双核心,因此最多4个线程应该是舒适的。 使用Runtime.getRuntime().availableProcessors();

进行检查

问题

当运行以下测试代码时,使用线程池运行单线程或多线程的许多短期任务,多线程版本要慢得多;查看资料数据。即使多线程部件没有共同的对象,情况也是如此。为什么会这样,有没有办法同时运行许多短期(1-4ms)任务?

即使使任务变得更大,也只会使多线程版本的方法单线程性能不超出我的预期,这让我觉得我做的事情严重错误。

enter image description here

测试代码

public class BroadPhaseAxisSweep implements Callable<Set<PotentialCollisionPrecursor>>  {

    static final int XAXIS=0;
    static final int YAXIS=1;
    static final int ZAXIS=2;

    int axis; 
    int[] axisIndicies;
    boolean[] isStatic;
    boolean[] isLightWeight; 
    boolean[] isCollidable; 

    //orders the same as axisIndicies
    double[] starts;
    double[] ends;

    private static ExecutorService sweepPool = Executors.newFixedThreadPool(3);

    public BroadPhaseAxisSweep(int axis, List<TestObject> allObjects) {
        //all data that will be used by the thread is cached internally to avoid 
        //any concurrent access issues

        this.axis = axis;

        //allObjects is in reality unsorted, axisIndicies holds sorted indices
        //in this case allObjects just "happens" to be already sorted
        this.axisIndicies =new int[allObjects.size()];
        for(int i=0;i<allObjects.size();i++){
            axisIndicies[i]=i;
        }
        isStatic=new boolean[allObjects.size()];
        for(int i=0;i<allObjects.size();i++){
            isStatic[i]=allObjects.get(i).isStatic();
        }
        isLightWeight=new boolean[allObjects.size()];
        for(int i=0;i<allObjects.size();i++){
            isLightWeight[i]=allObjects.get(i).isLightWeightPhysicsObject();
        }
        isCollidable=new boolean[allObjects.size()];
        for(int i=0;i<allObjects.size();i++){
            isCollidable[i]=allObjects.get(i).isCollidable();
        }

        starts=new double[allObjects.size()];
        for(int i=0;i<allObjects.size();i++){
            starts[i]=allObjects.get(i).getStartPoint();
        }
        ends=new double[allObjects.size()];
        for(int i=0;i<allObjects.size();i++){
            ends[i]=allObjects.get(i).getEndPoint();
        }
    }


    @Override
    public Set<PotentialCollisionPrecursor> call() throws Exception {
        return axisSweep_simple(axisIndicies);
    }

    private Set<PotentialCollisionPrecursor> axisSweep_simple(int[] axisIndicies){

        Set<PotentialCollisionPrecursor> thisSweep =new HashSet();


        for(int i=0;i<starts.length;i++){
            if (isCollidable[axisIndicies[i]]){
                double activeObjectEnd=ends[i];
                //sweep forwards until an objects start is before out end
                for(int j=i+1;j<starts.length;j++){
                    //j<startXsIndicies.length is the bare mininmum contrain, most js wont get that far
                    if ((isStatic[axisIndicies[i]]&& isStatic[axisIndicies[j]]) || ((isLightWeight[axisIndicies[i]]&& isLightWeight[axisIndicies[j]]))){
                        //if both objects are static or both are light weight then they cannot by definition collide, we can skip
                        continue;
                    }


                    if (activeObjectEnd>starts[j]){
                        PotentialCollisionPrecursor potentialCollision=new PotentialCollisionPrecursor(getObjectNumberFromAxisNumber(i),getObjectNumberFromAxisNumber(j));
                            thisSweep.add(potentialCollision);
                    }else{
                        break; //this is as far as this active object goes

                    }

                }
            }
        }

        return thisSweep;
    }


    private int getObjectNumberFromAxisNumber(int number){
        return axisIndicies[number];
    }


     public static void main(String[] args){
         int noOfObjectsUnderTest=250;

         List<TestObject> testObjects=new ArrayList<>();

         Random rnd=new Random();
         double runningStartPosition=0;
         for(int i=0;i<noOfObjectsUnderTest;i++){
             runningStartPosition+=rnd.nextDouble()*0.01;
             testObjects.add(new TestObject(runningStartPosition));
         }

         while(true){
             runSingleTreaded(testObjects);
             runMultiThreadedTreaded(testObjects);
         }

     }

    private static void runSingleTreaded(List<TestObject> testObjects) {
        try {
            //XAXIS used over and over again just for test
            Set<PotentialCollisionPrecursor> xSweep=(new BroadPhaseAxisSweep(XAXIS,testObjects)).call();
            Set<PotentialCollisionPrecursor> ySweep=(new BroadPhaseAxisSweep(XAXIS,testObjects)).call();
            Set<PotentialCollisionPrecursor> zSweep=(new BroadPhaseAxisSweep(XAXIS,testObjects)).call();

            System.out.println(xSweep.size()); //just so JIT can't possibly optimise out
            System.out.println(ySweep.size()); //just so JIT can't possibly optimise out
            System.out.println(zSweep.size()); //just so JIT can't possibly optimise out
        } catch (Exception ex) {
            //bad practice, example only
            Logger.getLogger(BroadPhaseAxisSweep.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

    private static void runMultiThreadedTreaded(List<TestObject> testObjects) {
        try {
            //XAXIS used over and over again just for test
            Future<Set<PotentialCollisionPrecursor>> futureX=sweepPool.submit(new BroadPhaseAxisSweep(XAXIS,testObjects));
            Future<Set<PotentialCollisionPrecursor>> futureY=sweepPool.submit(new BroadPhaseAxisSweep(XAXIS,testObjects));
            Future<Set<PotentialCollisionPrecursor>> futureZ=sweepPool.submit(new BroadPhaseAxisSweep(XAXIS,testObjects));

            Set<PotentialCollisionPrecursor> xSweep=futureX.get();
            Set<PotentialCollisionPrecursor> ySweep=futureY.get();
            Set<PotentialCollisionPrecursor> zSweep=futureZ.get();

            System.out.println(xSweep.size()); //just so JIT can't possibly optimise out
            System.out.println(ySweep.size()); //just so JIT can't possibly optimise out
            System.out.println(zSweep.size()); //just so JIT can't possibly optimise out
        } catch (Exception ex) {
            //bad practice, example only
            Logger.getLogger(BroadPhaseAxisSweep.class.getName()).log(Level.SEVERE, null, ex);
        }
    }


    public static class TestObject{

        final boolean isStatic;
        final boolean isLightWeight;
        final boolean isCollidable;
        final double startPointOnAxis;
        final double endPointOnAxis; 

        public TestObject(double startPointOnAxis) {
            Random rnd=new Random();
            this.isStatic = rnd.nextBoolean();
            this.isLightWeight =  rnd.nextBoolean();
            this.isCollidable =  rnd.nextBoolean();
            this.startPointOnAxis = startPointOnAxis;
            this.endPointOnAxis =startPointOnAxis+0.2*rnd.nextDouble();
        }

        public boolean isStatic() {
            return isStatic;
        }

        public boolean isLightWeightPhysicsObject() {
            return isLightWeight;
        }

        public boolean isCollidable() {
            return isCollidable;
        }

        public double getStartPoint() {
            return startPointOnAxis;
        }

        public double getEndPoint() {
            return endPointOnAxis;
        }
    }

}

public class PotentialCollisionPrecursor {
    //holds the object numbers of a potential collision, can be converted to a real PotentialCollision using a list of those objects
    private final int rigidBodyNumber1;
    private final int rigidBodyNumber2; 


    public PotentialCollisionPrecursor(int rigidBodyNumber1, int rigidBodyNumber2) {
        if (rigidBodyNumber1<rigidBodyNumber2){
            this.rigidBodyNumber1 = rigidBodyNumber1;
            this.rigidBodyNumber2 = rigidBodyNumber2;
        }else{
            this.rigidBodyNumber1 = rigidBodyNumber2;
            this.rigidBodyNumber2 = rigidBodyNumber1;
        }
    }

    public int getRigidBodyNumber1() {
        return rigidBodyNumber1;
    }

    public int getRigidBodyNumber2() {
        return rigidBodyNumber2;
    }

    @Override
    public int hashCode() {
        int hash = 7;
        hash = 67 * hash + this.rigidBodyNumber1;
        hash = 67 * hash + this.rigidBodyNumber2;
        return hash;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (getClass() != obj.getClass()) {
            return false;
        }
        final PotentialCollisionPrecursor other = (PotentialCollisionPrecursor) obj;
        if (this.rigidBodyNumber1 != other.rigidBodyNumber1) {
            return false;
        }
        if (this.rigidBodyNumber2 != other.rigidBodyNumber2) {
            return false;
        }
        return true;
    }

}

不同大小的ThreadPools

在单线程之后,下一个最快的是2/3线程的池,然后最慢的是线程池中的单个线程(不出所料,因为它具有所有开销而没有任何增益)

enter image description here

不自然的大任务规模

为了测试问题是否只是用于线程的任务太小我将任务大小增加到大约100毫秒。这些结果更令人困惑; 1到3之间的任意数量的线程速度大致相同,并且比单线程

enter image description here

2 个答案:

答案 0 :(得分:3)

如果您的广泛扫描只需要几毫秒,那么您最好同步完成所有操作。保留线程池线程(在Windows上)所需的工作时间超过5毫秒。更不用说你仍然需要在线程之间移动数据并等待上下文切换,然后最终将线程放回到找到它们的位置。

这整个过程可能会降低性能,尤其是因为您正在获取数据的本地副本。如果每次扫描都是自然独立的并且花费超过500毫秒,那么你可能会从像你实现的一些并发模型中受益。

值得注意的一件事是,如今的图形处理器配备了专用于物理计算的嵌入式协处理器。他们如此擅长做这样的事情是因为他们有时会有数千个处理器内核以相对较低的时钟速率运行。这意味着它们非常适合同时承担大量小任务。您可能希望尝试直接与图形处理器连接,以将物理处理卸载到这种环境中,而不是在通用CPU上使用它。

答案 1 :(得分:1)

我承诺总结一下这里的所有发现......这是相当令人困惑的,因为主要的罪魁祸首是将非主线程不成比例地减慢到主线程的分析器。也许它使用原子计数器来跟踪它的数据,也许它们的开销很高,导致这种不合理的结果。

手动测量时间可获得更好的结果,即多线程加速30-40%。这是有道理的,因为数据复制会产生很大的连续开销。

这种复制既不必要也不有用。这是不必要的,因为所有线程只读取共享数据。它没用,因为读取共享变量并不比读取它自己的副本慢:

  • 核心能够快速地从彼此的缓存中获取数据
  • 他们将这些副本放入本地L1和L2缓存(MESI协议的“共享”状态)
  • L3缓存是共享的,这意味着不必要的复制数据意味着由于更高的内存占用而导致更多L3未命中