为什么扰乱器会因较小的环形缓冲器而变慢?

时间:2017-07-03 20:06:28

标签: java performance performance-testing disruptor-pattern lmax

Disruptor Getting Started Guide之后,我建立了一个包含单个生产者和单个消费者的最小干扰者。

生产者

import com.lmax.disruptor.RingBuffer;

public class LongEventProducer
{
    private final RingBuffer<LongEvent> ringBuffer;

    public LongEventProducer(RingBuffer<LongEvent> ringBuffer)
    {
        this.ringBuffer = ringBuffer;
    }

    public void onData()
    {
        long sequence = ringBuffer.next();
        try
        {
            LongEvent event = ringBuffer.get(sequence);
        }
        finally
        {
            ringBuffer.publish(sequence);
        }
    }
}

消费者(请注意,消费者不做任何事情onEvent

import com.lmax.disruptor.EventHandler;

public class LongEventHandler implements EventHandler<LongEvent>
{
    public void onEvent(LongEvent event, long sequence, boolean endOfBatch)
    {}
}

我的目标是在大型环形缓冲区周围进行性能测试,而不是多次遍历较小的环。在每种情况下,总操作数(bufferSize X rotations)都是相同的。我发现当环形缓冲区变小时,操作/秒速率急剧下降。

RingBuffer Size |  Revolutions  | Total Ops   |   Mops/sec

    1048576     |      1        |  1048576    |     50-60

       1024     |      1024     |  1048576    |     8-16

        64      |      16384    |  1048576    |    0.5-0.7

        8       |      131072   |  1048576    |    0.12-0.14

问题: 当环形缓冲区大小减小但总迭代次数固定时,性能大幅下降的原因是什么?此趋势与{无关} {1}}和WaitStrategy - 吞吐量降低,但趋势是相同的。

主要(请注意Single vs MultiProducerSingleProducer

BusySpinWaitStrategy

要运行,您需要一些简单的工厂代码

import com.lmax.disruptor.BusySpinWaitStrategy;
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.dsl.ProducerType;

import java.util.concurrent.Executor;
import java.util.concurrent.Executors;

public class LongEventMainJava{
        static double ONEMILLION = 1000000.0;
        static double ONEBILLION = 1000000000.0;

    public static void main(String[] args) throws Exception {
            // Executor that will be used to construct new threads for consumers
            Executor executor = Executors.newCachedThreadPool();    

            // TUNABLE PARAMS
            int ringBufferSize = 1048576; // 1024, 64, 8
            int rotations = 1; // 1024, 16384, 131702

            // Construct the Disruptor
            Disruptor disruptor = new Disruptor<>(new LongEventFactory(), ringBufferSize, executor, ProducerType.SINGLE, new BusySpinWaitStrategy());

            // Connect the handler
            disruptor.handleEventsWith(new LongEventHandler());

            // Start the Disruptor, starts all threads running
            disruptor.start();

            // Get the ring buffer from the Disruptor to be used for publishing.
            RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();
            LongEventProducer producer = new LongEventProducer(ringBuffer);

            long start = System.nanoTime();
            long totalIterations = rotations * ringBufferSize;
            for (long i = 0; i < totalIterations; i++) {
                producer.onData();
            }
            double duration = (System.nanoTime()-start)/ONEBILLION;
            System.out.println(String.format("Buffersize: %s, rotations: %s, total iterations = %s, duration: %.2f seconds, rate: %.2f Mops/s",
                    ringBufferSize, rotations, totalIterations, duration, totalIterations/(ONEMILLION * duration)));
        }
}

在核心i5-2400上运行,12GB内存,Windows 7

示例输出

import com.lmax.disruptor.EventFactory;

public class LongEventFactory implements EventFactory<LongEvent>
{
    public LongEvent newInstance()
    {
        return new LongEvent();
    }
}

2 个答案:

答案 0 :(得分:3)

当生产者填满戒指缓冲区时,它必须等到事件被消耗才能继续。

当你的缓冲区正好是你要放入的元素数量的大小时,生产者永远不必等待。它永远不会溢出。它所做的只是递增计数,索引,并在该索引处的环形缓冲区中发布数据。

当您的缓冲区较小时,它仍然只是递增计数和发布,但它的执行速度比消费者可以消耗的速度快。因此,生产者必须等到消耗元素并释放环形缓冲区上的空间。

答案 1 :(得分:0)

似乎问题在于lmax\disruptor\SingleProducerSequencer

中的这段代码
if (wrapPoint > cachedGatingSequence || cachedGatingSequence > nextValue)
        {
            cursor.setVolatile(nextValue);  // StoreLoad fence

            long minSequence;
            while (wrapPoint > (minSequence = Util.getMinimumSequence(gatingSequences, nextValue)))
            {
                waitStrategy.signalAllWhenBlocking();
                LockSupport.parkNanos(1L); // TODO: Use waitStrategy to spin?
            }

            this.cachedValue = minSequence;
        }

特别是对LockSupport.parkNanos(1L)的呼叫。这最多可能需要15ms on Windows。当生产者到达缓冲区的末尾并等待消费者时,就会调用它。

其次,当缓冲区很小时,可能会发生RingBuffer的错误共享。我猜这两种效果都在起作用。

最后,在基准测试之前,我能够使用JIT加速代码,对onData()进行一百万次调用。这得到了最好的情况> 80Mops/sec,但没有消除缓冲区收缩造成的退化。