我正在尝试使用disruptor来处理消息。我需要两个阶段的处理。 即两组处理程序在这样的工作池中工作(我猜):
disruptor.
handleEventsWithWorkerPool(
firstPhaseHandlers)
.thenHandleEventsWithWorkerPool(
secondPhaseHandlers);
使用上面的代码时,如果我在每个组中放置多个工作线程,则性能会下降。意味着大量的CPU浪费了完全相同的工作量。
我试图调整环缓冲区大小(我已经看到它对性能有影响)但在这种情况下它没有帮助。所以我做错了什么,或者这是一个真正的问题?
我附上了该问题的完整演示。
import java.util.ArrayList;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
import com.lmax.disruptor.EventFactory;
import com.lmax.disruptor.EventTranslatorOneArg;
import com.lmax.disruptor.WorkHandler;
import com.lmax.disruptor.dsl.Disruptor;
final class ValueEvent {
private long value;
public long getValue() {
return value;
}
public void setValue(long value) {
this.value = value;
}
public final static EventFactory<ValueEvent> EVENT_FACTORY = new EventFactory<ValueEvent>() {
public ValueEvent newInstance() {
return new ValueEvent();
}
};
}
class MyWorkHandler implements WorkHandler<ValueEvent> {
AtomicLong workDone;
public MyWorkHandler (AtomicLong wd)
{
this.workDone=wd;
}
public void onEvent(final ValueEvent event) throws Exception {
workDone.incrementAndGet();
}
}
class My2ndPahseWorkHandler implements WorkHandler<ValueEvent> {
AtomicLong workDone;
public My2ndPahseWorkHandler (AtomicLong wd)
{
this.workDone=wd;
}
public void onEvent(final ValueEvent event) throws Exception {
workDone.incrementAndGet();
}
}
class MyEventTranslator implements EventTranslatorOneArg<ValueEvent, Long> {
@Override
public void translateTo(ValueEvent event, long sequence, Long value) {
event.setValue(value);
}
}
public class TwoPhaseDisruptor {
static AtomicLong workDone=new AtomicLong(0);
@SuppressWarnings("unchecked")
public static void main(String[] args) {
ExecutorService exec = Executors.newCachedThreadPool();
int numOfHandlersInEachGroup=Integer.parseInt(args[0]);
long eventCount=Long.parseLong(args[1]);
int ringBufferSize=2 << (Integer.parseInt(args[2]));
Disruptor<ValueEvent> disruptor = new Disruptor<ValueEvent>(
ValueEvent.EVENT_FACTORY, ringBufferSize,
exec);
ArrayList<MyWorkHandler> handlers = new ArrayList<MyWorkHandler>();
for (int i = 0; i < numOfHandlersInEachGroup ; i++) {
handlers.add(new MyWorkHandler(workDone));
}
ArrayList<My2ndPahseWorkHandler > phase2_handlers = new ArrayList<My2ndPahseWorkHandler >();
for (int i = 0; i < numOfHandlersInEachGroup; i++) {
phase2_handlers.add(new My2ndPahseWorkHandler(workDone));
}
disruptor
.handleEventsWithWorkerPool(
handlers.toArray(new WorkHandler[handlers.size()]))
.thenHandleEventsWithWorkerPool(
phase2_handlers.toArray(new WorkHandler[phase2_handlers.size()]));
long s = (System.currentTimeMillis());
disruptor.start();
MyEventTranslator myEventTranslator = new MyEventTranslator();
for (long i = 0; i < eventCount; i++) {
disruptor.publishEvent(myEventTranslator, i);
}
disruptor.shutdown();
exec.shutdown();
System.out.println("time spent "+ (System.currentTimeMillis() - s) + " ms");
System.out.println("amount of work done "+ workDone.get());
}
}
尝试在每个组中使用1个线程运行上面的示例
1 100000 7
在我的电脑上给出了
time spent 371 ms
amount of work done 200000
然后尝试每组中有4个线程
4 100000 7
在我的电脑上给出了
time spent 9853 ms
amount of work done 200000
在运行期间CPU处于100%利用率
答案 0 :(得分:2)
在线程/核心之间共享AtomicLong似乎是错误的。当我稍后有更多时间使用演示时,我会尝试一下,但是 - 更好的方法是让每个WorkHandler都有一个私有变量,每个线程拥有一个私有变量(或者它自己的AtomicLong,或者最好是一个普通长)。
更新
如果您将Disruptor行更改为:
Disruptor<ValueEvent> disruptor = new Disruptor<ValueEvent>(
ValueEvent.EVENT_FACTORY, ringBufferSize,
exec,
com.lmax.disruptor.dsl.ProducerType.SINGLE,
new com.lmax.disruptor.BusySpinWaitStrategy());
你会得到更好的结果:
jason@debian01:~/code/stackoverflow$ java -cp disruptor-3.1.1.jar:. TwoPhaseDisruptor 4 100000 1024
time spent 2728 ms
amount of work done 200000
我查看了代码并尝试修复错误共享,但发现没什么改进。就在那时我注意到我的8核上的CPU远不及100%(即使是四工测试)。据此我决定,如果你要刻录CPU,那么屈服/旋转等待策略会减少延迟。
确保您至少拥有8个核心(您需要8个核心进行处理,另外一个核心用于发布消息)。