在Disruptor Getting Started Guide之后,我建立了一个包含单个生产者和单个消费者的最小干扰者。
生产者
import com.lmax.disruptor.RingBuffer;
public class LongEventProducer
{
private final RingBuffer<LongEvent> ringBuffer;
public LongEventProducer(RingBuffer<LongEvent> ringBuffer)
{
this.ringBuffer = ringBuffer;
}
public void onData()
{
long sequence = ringBuffer.next();
try
{
LongEvent event = ringBuffer.get(sequence);
}
finally
{
ringBuffer.publish(sequence);
}
}
}
消费者(请注意,消费者不做任何事情onEvent
)
import com.lmax.disruptor.EventHandler;
public class LongEventHandler implements EventHandler<LongEvent>
{
public void onEvent(LongEvent event, long sequence, boolean endOfBatch)
{}
}
我的目标是在大型环形缓冲区周围进行性能测试,而不是多次遍历较小的环。在每种情况下,总操作数(bufferSize
X rotations
)都是相同的。我发现当环形缓冲区变小时,操作/秒速率急剧下降。
RingBuffer Size | Revolutions | Total Ops | Mops/sec
1048576 | 1 | 1048576 | 50-60
1024 | 1024 | 1048576 | 8-16
64 | 16384 | 1048576 | 0.5-0.7
8 | 131072 | 1048576 | 0.12-0.14
问题: 当环形缓冲区大小减小但总迭代次数固定时,性能大幅下降的原因是什么?此趋势与{无关} {1}}和WaitStrategy
- 吞吐量降低,但趋势是相同的。
主要(请注意Single vs MultiProducer
和SingleProducer
)
BusySpinWaitStrategy
要运行,您需要一些简单的工厂代码
import com.lmax.disruptor.BusySpinWaitStrategy;
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.dsl.ProducerType;
import java.util.concurrent.Executor;
import java.util.concurrent.Executors;
public class LongEventMainJava{
static double ONEMILLION = 1000000.0;
static double ONEBILLION = 1000000000.0;
public static void main(String[] args) throws Exception {
// Executor that will be used to construct new threads for consumers
Executor executor = Executors.newCachedThreadPool();
// TUNABLE PARAMS
int ringBufferSize = 1048576; // 1024, 64, 8
int rotations = 1; // 1024, 16384, 131702
// Construct the Disruptor
Disruptor disruptor = new Disruptor<>(new LongEventFactory(), ringBufferSize, executor, ProducerType.SINGLE, new BusySpinWaitStrategy());
// Connect the handler
disruptor.handleEventsWith(new LongEventHandler());
// Start the Disruptor, starts all threads running
disruptor.start();
// Get the ring buffer from the Disruptor to be used for publishing.
RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();
LongEventProducer producer = new LongEventProducer(ringBuffer);
long start = System.nanoTime();
long totalIterations = rotations * ringBufferSize;
for (long i = 0; i < totalIterations; i++) {
producer.onData();
}
double duration = (System.nanoTime()-start)/ONEBILLION;
System.out.println(String.format("Buffersize: %s, rotations: %s, total iterations = %s, duration: %.2f seconds, rate: %.2f Mops/s",
ringBufferSize, rotations, totalIterations, duration, totalIterations/(ONEMILLION * duration)));
}
}
在核心i5-2400上运行,12GB内存,Windows 7
示例输出
import com.lmax.disruptor.EventFactory;
public class LongEventFactory implements EventFactory<LongEvent>
{
public LongEvent newInstance()
{
return new LongEvent();
}
}
答案 0 :(得分:3)
当生产者填满戒指缓冲区时,它必须等到事件被消耗才能继续。
当你的缓冲区正好是你要放入的元素数量的大小时,生产者永远不必等待。它永远不会溢出。它所做的只是递增计数,索引,并在该索引处的环形缓冲区中发布数据。
当您的缓冲区较小时,它仍然只是递增计数和发布,但它的执行速度比消费者可以消耗的速度快。因此,生产者必须等到消耗元素并释放环形缓冲区上的空间。
答案 1 :(得分:0)
似乎问题在于lmax\disruptor\SingleProducerSequencer
if (wrapPoint > cachedGatingSequence || cachedGatingSequence > nextValue)
{
cursor.setVolatile(nextValue); // StoreLoad fence
long minSequence;
while (wrapPoint > (minSequence = Util.getMinimumSequence(gatingSequences, nextValue)))
{
waitStrategy.signalAllWhenBlocking();
LockSupport.parkNanos(1L); // TODO: Use waitStrategy to spin?
}
this.cachedValue = minSequence;
}
特别是对LockSupport.parkNanos(1L)
的呼叫。这最多可能需要15ms on Windows。当生产者到达缓冲区的末尾并等待消费者时,就会调用它。
其次,当缓冲区很小时,可能会发生RingBuffer的错误共享。我猜这两种效果都在起作用。
最后,在基准测试之前,我能够使用JIT加速代码,对onData()
进行一百万次调用。这得到了最好的情况> 80Mops/sec
,但没有消除缓冲区收缩造成的退化。