Java同步:只保持关键部分小?

时间:2015-02-10 09:58:08

标签: java multithreading thread-synchronization

我正在做一些基本的实验来评估同步块的开销。 我对结果感到非常困惑,因此提出了问题。

在下面的代码中,许多线程在所谓的关键部分中测试并递增全局计数器(最多为目标数)。 此外,可以在关键部分内部或外部执行额外的可配置工作负载。

使用常量,尤其是LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTIONLOAD_OF_WORK_INSIDE_CRITICAL_SECTION,我所观察到的是,synchronized块引入的开销只能用于大 临界区内的工作量。 请参阅以下两个输出示例:

Processors: 4
NUM_OF_THREADS: 4
LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION: 1000000
LOAD_OF_WORK_INSIDE_CRITICAL_SECTION: 100
NUM_OF_JOBS_GOAL: 10000
Non synchronized - Goal reached, elapsed time: 6370 milliseconds.
Synchronized - Goal reached, elapsed time: 6355 milliseconds.

Processors: 4
NUM_OF_THREADS: 4
LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION: 100
LOAD_OF_WORK_INSIDE_CRITICAL_SECTION: 1000000
NUM_OF_JOBS_GOAL: 10000
Non synchronized - Goal reached, elapsed time: 6351 milliseconds.
Synchronized - Goal reached, elapsed time: 18629 milliseconds.

正如您所看到的,同步开销似乎只发生在高LOAD_OF_WORK_INSIDE_CRITICAL_SECTION上。 这本身并不令人困惑,当然,也证实保持关键部分较小是一种很好的做法。 但考虑到这一点,对于良好的实践,拥有大的关键部分并不常见,这个结果与尽可能避免代码中的synchronized关键字的共同观点相冲突。我想说同步关键字对于关键部分的少量工作总是安全的。

所以我害怕我在我的代码或头脑中做错了什么。你能帮我解释一下吗?

以下是我用于测试的代码。 谢谢,抱歉我的英语不好。

最诚挚的问候, 约翰

操作系统:Windows 7 Java版本:1.7.0_67(32位)

public class MainClass {

    public static void main(String[] args) throws Exception {

        long startMilliseconds = System.currentTimeMillis();
        final long NUM_OF_JOBS_GOAL = 10000L;
        final int LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION = 1000000;
        final int LOAD_OF_WORK_INSIDE_CRITICAL_SECTION = 100;
        final int NUM_OF_THREADS = Runtime.getRuntime().availableProcessors();

        System.out.println("Processors: " + Runtime.getRuntime().availableProcessors());
        System.out.println("NUM_OF_THREADS: " + NUM_OF_THREADS);
        System.out.println("LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION: " + LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION);
        System.out.println("LOAD_OF_WORK_INSIDE_CRITICAL_SECTION: " + LOAD_OF_WORK_INSIDE_CRITICAL_SECTION);
        System.out.println("NUM_OF_JOBS_GOAL: " + NUM_OF_JOBS_GOAL);

        doConcurrentJob(NUM_OF_THREADS, startMilliseconds, NUM_OF_JOBS_GOAL, LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION, LOAD_OF_WORK_INSIDE_CRITICAL_SECTION);

        //Reset state
        startMilliseconds = System.currentTimeMillis();
        CounterThread.goalGlobalCounter = 0;
        CounterThread.goalReached = false;

        doConcurrentSynchronizedJob(NUM_OF_THREADS, startMilliseconds, NUM_OF_JOBS_GOAL, LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION, LOAD_OF_WORK_INSIDE_CRITICAL_SECTION);

    }

    static void doConcurrentJob(int numOfThreads, long startMilliseconds, long numOfJobsGoal, int loadOfWorkOutsideCriticalSection, int loadOfWorkInsideCriticalSection) throws Exception {
        CounterThread[] counterThreads = new CounterThread[numOfThreads];
        while (!CounterThread.goalReached) {
            for (int i = 0; i < counterThreads.length; i++) {
                if (counterThreads[i] == null || !counterThreads[i].isAlive()) {
                    counterThreads[i] = new CounterThread(numOfJobsGoal, loadOfWorkOutsideCriticalSection, loadOfWorkInsideCriticalSection);
                    counterThreads[i].start();
                }
            }
        }
        System.out.println("Non synchronized - Goal reached, elapsed time: " + (System.currentTimeMillis() - startMilliseconds) + " milliseconds.");
        System.out.flush();
        for (int i = 0; i < counterThreads.length; i++) {
            counterThreads[i].join();
        }
    }

    static void doConcurrentSynchronizedJob(int numOfThreads, long startMilliseconds, long numOfJobsGoal, int loadOfWorkOutsideCriticalSection, int loadOfWorkInsideCriticalSection) throws Exception {
        CounterThreadSynchronized[] counterThreadsSyncronized = new CounterThreadSynchronized[numOfThreads];
        while (!CounterThread.goalReached) {
            for (int i = 0; i < counterThreadsSyncronized.length; i++) {
                if (counterThreadsSyncronized[i] == null || !counterThreadsSyncronized[i].isAlive()) {
                    counterThreadsSyncronized[i] = new CounterThreadSynchronized(startMilliseconds, numOfJobsGoal, loadOfWorkOutsideCriticalSection, loadOfWorkInsideCriticalSection);
                    counterThreadsSyncronized[i].start();
                }
            }
        }
        System.out.println("Synchronized - Goal reached, elapsed time: " + (System.currentTimeMillis() - startMilliseconds) + " milliseconds.");
        System.out.flush();
        for (int i = 0; i < counterThreadsSyncronized.length; i++) {
            counterThreadsSyncronized[i].join();
        }
    }
}

class CounterThread extends Thread {

    public static int goalGlobalCounter = 0;
    public static boolean goalReached;

    public final long GOAL;
    protected final int LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION;
    private final int LOAD_OF_WORK_INSIDE_CRITICAL_SECTION;

    protected int fooSpinner;

    public CounterThread(long numOfJobsGoal, int loadOfWorkOutsideCriticalSection, int loadOfWorkInsideCriticalSection) {
        this.GOAL = numOfJobsGoal;
        this.LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION = loadOfWorkOutsideCriticalSection;
        this.LOAD_OF_WORK_INSIDE_CRITICAL_SECTION = loadOfWorkInsideCriticalSection;
    }

    public void run() {
        for (long i = 0; i < LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION; i++) {
            fooSpinner++;
        }
        executeCriticalSection();
    }

    public void executeCriticalSection() {

        for (long i = 0; i < LOAD_OF_WORK_INSIDE_CRITICAL_SECTION; i++) {
            fooSpinner++;
        }
        if (goalGlobalCounter < GOAL) {
            goalGlobalCounter++;
        } else {
            goalReached = true;

        }
    }

}

class CounterThreadSynchronized extends CounterThread {

    protected static final Object globalMutex = new Object();

    public CounterThreadSynchronized(long startMilliseconds, long numOfJobsGoal, int loadOfWorkOutsideCriticalSection, int loadOfWorkInsideCriticalSection) {
        super(numOfJobsGoal, loadOfWorkOutsideCriticalSection, loadOfWorkInsideCriticalSection);
    }

    @Override
    public void run() {
        for (long i = 0; i < LOAD_OF_WORK_OUTSIDE_CRITICAL_SECTION; i++) {
            fooSpinner++;
        }
        synchronized (globalMutex) {
            executeCriticalSection();
        }
    }
}

修改

Mike Nakis:我已经复制粘贴了您的代码,并且没有与您相同的结果。以下是TEST_DURATION = 1000的连续10次运行的日志。

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  494682730
  10000  |      10  |    true   |  515156056
     10  |   10000  |   false   |  520437287
     10  |   10000  |    true   |  135192560
     10  |      10  |   false   |  499448540
     10  |      10  |    true   |   64254608
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  519790639
  10000  |      10  |    true   |  507597477
     10  |   10000  |   false   |  520784275
     10  |   10000  |    true   |  133563124
     10  |      10  |   false   |  510318548
     10  |      10  |    true   |   66006750
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  512302804
  10000  |      10  |    true   |  514999373
     10  |   10000  |   false   |  526430883
     10  |   10000  |    true   |  132596432
     10  |      10  |   false   |  506235601
     10  |      10  |    true   |   66220700
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  505257231
  10000  |      10  |    true   |  512668300
     10  |   10000  |   false   |  528309859
     10  |   10000  |    true   |  133947238
     10  |      10  |   false   |  518984983
     10  |      10  |    true   |   63617110
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  522235388
  10000  |      10  |    true   |  502896342
     10  |   10000  |   false   |  515668568
     10  |   10000  |    true   |  130705136
     10  |      10  |   false   |  514470943
     10  |      10  |    true   |   60617050
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  517828858
  10000  |      10  |    true   |  515355048
     10  |   10000  |   false   |  512963551
     10  |   10000  |    true   |  134235958
     10  |      10  |   false   |  515017236
     10  |      10  |    true   |   62228490
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  521690615
  10000  |      10  |    true   |  527830725
     10  |   10000  |   false   |  512735126
     10  |   10000  |    true   |  134278503
     10  |      10  |   false   |  507281283
     10  |      10  |    true   |   63333950
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  515604517
  10000  |      10  |    true   |  529685270
     10  |   10000  |   false   |  520260430
     10  |   10000  |    true   |  131993844
     10  |      10  |   false   |  505190996
     10  |      10  |    true   |   66865140
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  522747273
  10000  |      10  |    true   |  530824975
     10  |   10000  |   false   |  536263165
     10  |   10000  |    true   |  131938210
     10  |      10  |   false   |  502281027
     10  |      10  |    true   |   64480710
Done.

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  523386208
  10000  |      10  |    true   |  511467042
     10  |   10000  |   false   |  512778324
     10  |   10000  |    true   |  133751262
     10  |      10  |   false   |  513257782
     10  |      10  |    true   |   61573350
Done.

正如我的问题的标题所示,我主要对低“内部”/“外部”比率感兴趣,也就是说输出的前两个配置。看看输出,不能老实说无论如何锁定比非锁定慢。

4 个答案:

答案 0 :(得分:3)

这真的取决于因为&#34;小&#34;的定义。从问题到问题是非常不同的。幸运的是,Amdahl's law可以让你明白这一点。

  

Amdahl定律指出,如果P是可以并行的程序的比例,并且(1-P)是不能并行化的比例,那么使用N可以实现的最大加速比处理器是S(N)= 1 /((1-P)+ P / N)

&#34;关键会议&#34;将构成无法并行化的比例&#34;因此,您制作的时间越长,您可能通过并行化实现的潜在吞吐量增益就越低。

在实践中,这并不是那么明确。例如,锁定的开销可能大于理论增益。出于这个原因,JVM有时会执行&#34;锁定粗化&#34;这实际上会使关键部分更长,但会减少总体开销。

答案 1 :(得分:0)

是的,你不应该不必要地将代码放在同步块中,但是出于性能原因,你不应该像生产者消费者问题那样破坏你的功能,生产者和消费者之间必须有一些同步是的当然它有点开销保持锁定和所有必需的东西同步,但它是必需的。否则你的系统将不会以紧急方式运行,在这种情况下没有任何表现。

如此长的故事在短时间内仅在需要时才使用同步。

答案 2 :(得分:0)

尽可能避免代码中同步关键字的普遍看法当然仍然有,原因有两个,一个与性能有关,另一个与性能完全无关。

关于表现:

当您的关键部分尽可能小时,您可以减少争用的可能性,但不要消除它们。争用的可能性更多地取决于线程尝试进入临界区的频率,而不是它们停留多长时间。每当一个线程试图进入一个关键部分而另一个线程已经在其中时,你就会受到巨大的性能损失。

但是,与性能无关的一个更重要的问题是:

锁定(synchronized关键字)是只有经验丰富的程序员才能做到的事情,甚至他们的代码实际上也是不可测试的,所以它本质上只是一堆等待发生的错误。没有正式的方法可以保证锁定代码没有错误,鉴于存在至少一种事实上可以保证这一点的方法,这一点尤其糟糕。 (不可变的消息传递。)

修改

我发现Johnca程序的行为很奇怪,所以我继续调整他的程序,使其更有用,更容易理解,也可能更正确。这是:

class SynchronizationTest
{
    public static void main( String[] args ) throws Exception
    {
        SynchronizationTest program = new SynchronizationTest();
        program.run();
    }

    static final long BIG_NUMBER = 10_000L;
    static final long SMALL_NUMBER = 10L;
    static final int NUM_OF_THREADS = Runtime.getRuntime().availableProcessors();
    static final int TEST_DURATION = 500;

    public void run() throws Exception
    {
        System.out.println( "Processors: " + Runtime.getRuntime().availableProcessors() + " Threads: " + NUM_OF_THREADS );
        System.out.println( "Outside  |  Inside  |  Locking  |  Work Done" );
        doConcurrentJob( BIG_NUMBER, SMALL_NUMBER, false );
        doConcurrentJob( BIG_NUMBER, SMALL_NUMBER, true );
        doConcurrentJob( SMALL_NUMBER, BIG_NUMBER, false );
        doConcurrentJob( SMALL_NUMBER, BIG_NUMBER, true );
        doConcurrentJob( SMALL_NUMBER, SMALL_NUMBER, false );
        doConcurrentJob( SMALL_NUMBER, SMALL_NUMBER, true );
        System.out.println( "Done." );
    }

    static void doConcurrentJob( long outside, long inside, boolean useLocking ) throws Exception
    {
        MyThread[] myThreads = new MyThread[NUM_OF_THREADS];
        boolean[] stopFlag = { false };
        for( int i = 0; i < myThreads.length; i++ )
            myThreads[i] = new MyThread( outside, inside, useLocking, stopFlag );
        for( MyThread myThread : myThreads )
            myThread.start();
        Thread.sleep( TEST_DURATION );
        stopFlag[0] = true;
        for( MyThread myThread : myThreads )
            myThread.join();
        long sumOfWorkDone = 0;
        for( MyThread myThread : myThreads )
            sumOfWorkDone += myThread.workDone;
        System.out.printf( "%7d  | %7d  |   %5b   | %10d\n", outside, inside, useLocking, sumOfWorkDone );
    }

    @SuppressWarnings( "ClassExplicitlyExtendsThread" )
    static class MyThread extends Thread
    {
        protected static final Object GLOBAL_MUTEX = new Object();
        private final long outside;
        private final long inside;
        private final boolean useSynchronization;
        volatile int workDone = 0;
        private final boolean[] stopFlag;

        MyThread( long outside, long inside, boolean useSynchronization, boolean[] stopFlag )
        {
            this.outside = outside;
            this.inside = inside;
            this.useSynchronization = useSynchronization;
            this.stopFlag = stopFlag;
        }

        @SuppressWarnings( "RefusedBequest" )
        @Override
        public void run()
        {
            while( !stopFlag[0] )
            {
                doWork( outside );
                if( useSynchronization )
                {
                    //noinspection SynchronizationOnStaticField
                    synchronized( GLOBAL_MUTEX )
                    {
                        doWork( inside );
                    }
                }
                else
                {
                    doWork( inside );
                }
            }
        }

        private void doWork( long amount )
        {
            for( long i = 0L; i < amount; i++ )
            {
                if( stopFlag[0] )
                    break;
                //noinspection NonAtomicOperationOnVolatileField
                workDone++;
            }
        }
    }
}

使用TEST_DURATION = 1000在我的机器上的结果是:

Processors: 4 Threads: 4
Outside  |  Inside  |  Locking  |  Work Done
  10000  |      10  |   false   |  181081027
  10000  |      10  |    true   |  149043896
     10  |   10000  |   false   |  210331458
     10  |   10000  |    true   |   58841199
     10  |      10  |   false   |  230592182
     10  |      10  |    true   |   38739670
Done.

我解释上述结果的方式如下:

  1. 在所有情况下,锁定都比非锁定慢,但结果会有所不同。
  2. 当在关键部分之外完成的工作很大,并且内部完成的工作很小时,很少发生线程将尝试获取锁定而另一个线程已经具有锁定,因此差异很小,但是仍然是非锁定代码可以完成更多工作。
  3. 当在关键部分之外完成的工作很小,并且内部完成的工作很大时,锁定代码执行得更糟,因为线程通常在等待其他线程放开关键部分。
  4. 当外部和内部完成的工作很小时,锁定代码执行得更糟,因为线程通常设法完成很少的工作,然后他们必须重试关键部分,此时他们通常会发现它被锁定并且必须等待
  5. 然而,事实是,我希望发现差异更加明显。我认为这些差异只有几个数量级,而不是在同一数量级内。我不知道为什么会这样。也许我的代码也不正确。至少它更容易验证。

答案 3 :(得分:0)

通常,不要使用synchronized关键字,而是使用java.concurrent包,它提供了一些更高级别的原语,如可重入锁定,读/写锁定,乐观锁定等。

此外,您应该知道热点在引擎盖下进行了大量优化,最终在不需要时删除了锁。这可能会解释您的结果。

最后,您的字段未标记为易变,您可能也会遇到一些问题。

即使你知道你的东西,并发也很难。