为什么这个多线程程序的执行时间如此怪异?

时间:2019-10-11 20:11:11

标签: java multithreading

import java.util.ArrayList;


interface ICounter {
    void inc();
    void dec();
    int get();
}


class Counter implements ICounter {
    protected int counter = 0;

    public void inc() { counter++; }
    public void dec() { counter--; }

    public int get() { return counter; }
}

class SynchronizedCounter extends Counter {
    public synchronized void inc() { counter++; }
    public synchronized void dec() { counter--; }
}

class BetterCounter implements ICounter {
    protected long up = 0;
    protected long down = 0;

    public void inc()  { up++; }
    public void dec()  { down--; }

    public int get() { return (int)(up + down); }
}


class SynchronizedBetterCounter extends BetterCounter {
    private Object lock1 = new Object();
    private Object lock2 = new Object();

    public void inc() { synchronized(lock1) { up++; } }
    public void dec() { synchronized(lock2) { down--; }  }

}


class Inc extends Thread {
    ICounter i;

    public Inc(ICounter i) { this.i = i; }

    public void run() {
        for(int x = 0; x < 2147483647 ; x++) {
            i.inc();
        }
    }
}

class Dec extends Thread {
    ICounter i;

    public Dec(ICounter i) { this.i = i; }

    public void run() {
        for(int x = 0; x < 2147483647 ; x++) {
            i.dec();
        }
    }
}


public class Main {

    public static void main(String[] args) {
        ICounter[] c = {new Counter(), new SynchronizedCounter(), new BetterCounter(), new SynchronizedBetterCounter()};
        int numberOfCounters = 4;

        ArrayList<Inc> inc = new ArrayList<>();
        ArrayList<Dec> dec = new ArrayList<>();

        for(int i = 0; i < numberOfCounters; i++) {
            inc.add(new Inc(c[i]));
            dec.add(new Dec(c[i]));
        }

        long start = 0;
        long stop = 0;

        int returnVal[] = new int[4];
        long execTime[] = new long[4];

        for(int i = 0; i < numberOfCounters; i++) {
            start = System.currentTimeMillis();

            inc.get(i).start();
            dec.get(i).start();

            try {
                inc.get(i).join();
                dec.get(i).join();
            } catch (InterruptedException ex) {
            }

            stop = System.currentTimeMillis();
            returnVal[i] = c[i].get();
            execTime[i] = stop-start;
        }

        for(int i = 0; i < 4; i++) {
            System.out.println("Counter: " + c[i].getClass().getSimpleName());
            System.out.println("Value of counter: " + returnVal[i]);
            System.out.println("Execution time: " + execTime[i] + " ms\n");
        }
    }
}

目标是与方法inc相抵以增加值,而方法dec与之相抵。它以4种方式编写-基本,具有同步,具有2个变量(增加1,减少2)以及具有2个变量和同步。有2个用于多线程的类IncDec

我知道这段代码可能会出现问题,但目标只是尽可能快地一次加或减大数。

输出

Counter: Counter
Value of counter: 2061724420
Execution time: 141 ms

Counter: SynchronizedCounter
Value of counter: 0
Execution time: 174210 ms

Counter: BetterCounter
Value of counter: 0
Execution time: 39468 ms

Counter: SynchronizedBetterCounter
Value of counter: 0
Execution time: 52176 ms

问题: 与其他相比,为什么第一次执行时间这么短?

2 个答案:

答案 0 :(得分:2)

tl; dr:Counter vs SynchronizedCounter正在比较每个线程中执行的〜800M完全可预测的指令,与带有跨核同步的〜250000M指令进行比较。

我用java -server -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,Inc.run -XX:+PrintCompilation Main运行代码,然后查看反汇编。

经过一些迭代,在Inc::run情况下,这是Counter的大部分:

  0x00007fcdb12bc6c0: mov    %r9d,%r11d
  0x00007fcdb12bc6c3: add    %r8d,%r11d
  0x00007fcdb12bc6c6: mov    %r11d,%ecx
  0x00007fcdb12bc6c9: add    $0x10,%ecx
  0x00007fcdb12bc6cc: mov    %ecx,0xc(%r10)     ;*putfield counter
                                                ; - Counter::inc@7 (line 14)
                                                ; - Inc::run@12 (line 53)

  0x00007fcdb12bc6d0: add    $0x10,%r9d         ;*iinc
                                                ; - Inc::run@17 (line 52)

  0x00007fcdb12bc6d4: cmp    $0x7ffffff0,%r9d
  0x00007fcdb12bc6db: jl     0x00007fcdb12bc6c0  ;*if_icmpge
                                                ; - Inc::run@5 (line 52)

这基本上等同于:

int value = i.counter;
for (; i<2147483632; i+=16) {
  value += 16;
  i.counter = value;
}

因此,换句话说,它内联函数,在每次迭代中添加16,而从不花心地读回该值。

为进行比较,下面是68行SynchronizedCounter.inc的方法(看看是否可以发现组成counter++本身的3行):

  0x00007fb9f356a380: mov    %eax,-0x14000(%rsp)
  0x00007fb9f356a387: push   %rbp
  0x00007fb9f356a388: sub    $0x40,%rsp
  0x00007fb9f356a38c: lea    0x20(%rsp),%rdi
  0x00007fb9f356a391: mov    %rsi,0x8(%rdi)
  0x00007fb9f356a395: mov    (%rsi),%rax
  0x00007fb9f356a398: mov    %rax,%rbx
  0x00007fb9f356a39b: and    $0x7,%rbx
  0x00007fb9f356a39f: cmp    $0x5,%rbx
  0x00007fb9f356a3a3: jne    0x00007fb9f356a42a
  0x00007fb9f356a3a9: mov    0x8(%rsi),%ebx
  0x00007fb9f356a3ac: shl    $0x3,%rbx
  0x00007fb9f356a3b0: mov    0xa8(%rbx),%rbx
  0x00007fb9f356a3b7: or     %r15,%rbx
  0x00007fb9f356a3ba: xor    %rax,%rbx
  0x00007fb9f356a3bd: and    $0xffffffffffffff87,%rbx
  0x00007fb9f356a3c1: je     0x00007fb9f356a452
  0x00007fb9f356a3c7: test   $0x7,%rbx
  0x00007fb9f356a3ce: jne    0x00007fb9f356a417
  0x00007fb9f356a3d0: test   $0x300,%rbx
  0x00007fb9f356a3d7: jne    0x00007fb9f356a3f6
  0x00007fb9f356a3d9: and    $0x37f,%rax
  0x00007fb9f356a3e0: mov    %rax,%rbx
  0x00007fb9f356a3e3: or     %r15,%rbx
  0x00007fb9f356a3e6: lock cmpxchg %rbx,(%rsi)
  0x00007fb9f356a3eb: jne    0x00007fb9f356a497
  0x00007fb9f356a3f1: jmpq   0x00007fb9f356a452
  0x00007fb9f356a3f6: mov    0x8(%rsi),%ebx
  0x00007fb9f356a3f9: shl    $0x3,%rbx
  0x00007fb9f356a3fd: mov    0xa8(%rbx),%rbx
  0x00007fb9f356a404: or     %r15,%rbx
  0x00007fb9f356a407: lock cmpxchg %rbx,(%rsi)
  0x00007fb9f356a40c: jne    0x00007fb9f356a497
  0x00007fb9f356a412: jmpq   0x00007fb9f356a452
  0x00007fb9f356a417: mov    0x8(%rsi),%ebx
  0x00007fb9f356a41a: shl    $0x3,%rbx
  0x00007fb9f356a41e: mov    0xa8(%rbx),%rbx
  0x00007fb9f356a425: lock cmpxchg %rbx,(%rsi)
  0x00007fb9f356a42a: mov    (%rsi),%rax
  0x00007fb9f356a42d: or     $0x1,%rax
  0x00007fb9f356a431: mov    %rax,(%rdi)
  0x00007fb9f356a434: lock cmpxchg %rdi,(%rsi)
  0x00007fb9f356a439: je     0x00007fb9f356a452
  0x00007fb9f356a43f: sub    %rsp,%rax
  0x00007fb9f356a442: and    $0xfffffffffffff007,%rax
  0x00007fb9f356a449: mov    %rax,(%rdi)
  0x00007fb9f356a44c: jne    0x00007fb9f356a497  ;*aload_0
                                                ; - SynchronizedCounter::inc@0 (line 21)

  0x00007fb9f356a452: mov    0xc(%rsi),%eax     ;*getfield counter
                                                ; - SynchronizedCounter::inc@2 (line 21)

  0x00007fb9f356a455: inc    %eax
  0x00007fb9f356a457: mov    %eax,0xc(%rsi)     ;*putfield counter
                                                ; - SynchronizedCounter::inc@7 (line 21)

  0x00007fb9f356a45a: lea    0x20(%rsp),%rax
  0x00007fb9f356a45f: mov    0x8(%rax),%rdi
  0x00007fb9f356a463: mov    (%rdi),%rsi
  0x00007fb9f356a466: and    $0x7,%rsi
  0x00007fb9f356a46a: cmp    $0x5,%rsi
  0x00007fb9f356a46e: je     0x00007fb9f356a48b
  0x00007fb9f356a474: mov    (%rax),%rsi
  0x00007fb9f356a477: test   %rsi,%rsi
  0x00007fb9f356a47a: je     0x00007fb9f356a48b
  0x00007fb9f356a480: lock cmpxchg %rsi,(%rdi)
  0x00007fb9f356a485: jne    0x00007fb9f356a4a7  ;*return
                                                ; - SynchronizedCounter::inc@10 

除此之外,Inc::run中还有93行循环,每次循环进行20亿次迭代时都调用上述函数。因此,信封计算建议从这篇文章的开头开始tl;dr

答案 1 :(得分:0)

这寻求Java编译器的优化:for(int x = 0; x < 2147483647 ; x++) {i.inc()}可以简化为i += 2147483647

在不使用JIT(-Djava.compiler=NONE)的情况下运行代码,您会发现其中的区别。