在我们的一个包含1个喷口和1个螺栓的拓扑结构中 - 我有一种预感,即螺栓完成正常(并且正在进行)但是喷嘴仍在失效。
我尝试通过如下的TaskHook来确认这一点 -
public class BaseHook extends BaseTaskHook {
private Logger logger;
private String topology;
private String component;
public BaseHook(String component) {
this.component = component;
}
@Override
public void prepare(Map conf, TopologyContext context) {
logger = LoggerFactory.getLogger(this.getClass());
this.topology = (String) conf.get("topology.name");
}
@Override
public void emit(EmitInfo info) {
log("EMITTED >> Value = " + info.values);
}
@Override
public void spoutAck(SpoutAckInfo info) {
log("ACKED >> Tuple = " + info.messageId + ", Latency = " + info.completeLatencyMs);
}
@Override
public void spoutFail(SpoutFailInfo info) {
log("FAILED >> Tuple = " + info.messageId + ", Latency = " + info.failLatencyMs);
}
@Override
public void boltExecute(BoltExecuteInfo info) {
log("EXECUTED >> Tuple = " + info.tuple.getValues() + ", Latency = " + info.executeLatencyMs);
}
@Override
public void boltAck(BoltAckInfo info) {
log("ACKED >> Tuple = " + info.tuple.getValues() + ", Latency = " + info.processLatencyMs);
}
@Override
public void boltFail(BoltFailInfo info) {
log("FAILED >> Tuple = " + info.tuple.getValues() + ", Latency = " + info.failLatencyMs);
}
private void log(String msg) {
logger.info(">>>>> " + topology + " >> " + component + " >> " + msg);
}
}
原来我的预感是正确的。日志看起来像这样 -
>>>>> TopologyX >> SpoutX >> EMITTED >> Value = [XXXXXXXXX]
>>>>> TopologyX >> BoltX >> ACKED >> Tuple = [XXXXXXXXX], Latency = 1972
>>>>> TopologyX >> BoltX >> EXECUTED >> Tuple = [XXXXXXXXX], Latency = 1973
>>>>> TopologyX >> SpoutX >> FAILED >> Tuple = XXXXXXXXX, Latency = 53913
即。 Bolt几乎花费了2s(To Execute和Ack),但是Spout Fail被调用大约53s(几乎是topology.message.timeout.secs * 2
的两倍。
我希望在2-3秒内也可以调用Spout Ack。喷嘴是无阻塞的,螺栓和螺栓都有足够的工作能力。
任何人都有任何暗示可能是什么原因?
所以这就是风暴群集的样子 -
T1
= S> B> B> B>的Ack /失败T2
= S> B>的Ack /失败T3
= S> B> B>的Ack /失败T4
=
因此,有问题的拓扑是T4
即。一个有2个不同的喷口和2个螺栓。其中一个流程通常工作正常(它们具有唯一标识元组的不同messageIds)
这可能是问题吗?
反正,
T4
中并没有改善任何事情。T4
T1
,但仍然运行良好T2
(以及T3
其他场合)T4
开始失败现在,
T4
甚至可以使用T1和T3。T2
或T3
,否则T4
会崩溃。注意事项 -
T3
和T4
都是快速拓扑,即。他们的流程在< 100ms的T3
和T4
两个执行器T3
和T4
都有Max Tuple Pending = 1 T3
和T4
进行速率限制(但已经尝试过没有速率限制)
所有Spouts都从BaseSpout类扩展 -
public abstract class BaseSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
context.addTaskHook(new BaseHook(this.getClass().getSimpleName()));
try {
this.collector = collector;
open();
} catch (Exception e) {
throw new RuntimeException("Error when preparing spout", e);
}
}
@Override
public void nextTuple() {
try {
getTuple();
} catch (Throwable t) {
if (!(t instanceof FailedException)) {
t = new FailedException("nextTuple()", t);
}
collector.reportError(t);
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
String[] fields = getFields();
if (fields != null) {
declarer.declare(new Fields(fields));
}
}
protected void emit(Values values, String msgId) {
collector.emit(values, msgId);
}
protected abstract void open() throws Exception;
protected abstract void getTuple() throws Exception;
protected abstract String[] getFields();
}
并且所有的螺栓都从BaseBolt类扩展 -
public abstract class BaseBolt extends BaseRichBolt {
private OutputCollector collector;
@Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
context.addTaskHook(new BaseHook(this.getClass().getSimpleName()));
try {
this.collector = collector;
prepare();
} catch (Exception e) {
throw new RuntimeException("Error when preparing bolt", e);
}
}
@Override
public void execute(Tuple tuple) {
try {
process(tuple);
collector.ack(tuple);
} catch (Throwable t) {
if (!(t instanceof FailedException)) {
t = new FailedException("execute(" + tuple + ")", t);
}
collector.reportError(t);
collector.fail(tuple);
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
String[] fields = getFields();
if (fields != null) {
declarer.declare(new Fields(fields));
}
}
protected void emit(Tuple tuple, Values values) {
collector.emit(tuple, values);
}
protected abstract void prepare() throws Exception;
protected abstract void process(Tuple tuple) throws Exception;
protected abstract String[] getFields();
}
所以说,没有发出没有messageID(来自spout)或unanchored tuple(来自bolt)的元组
答案 0 :(得分:0)
这里的问题是对Spout.nextTuple()
和Spout.ack()
或Spout.fail()
的调用都发生在同一线程上。如果您将大量元组放入拓扑中,则确认或失败消息最终将等待源喷嘴处理,从而导致确认/失败的时间延长。
您还提到“睡觉”没有效果。如果您是说在喷口Thread.sleep()
方法中调用了nextTuple()
,那么这只会使情况变得更糟,因为您正在停止将处理确认/失败的线程。