我目前正在尝试实现与R语言集成的Storm拓扑。
作为一个起点,我采用了以下项目(https://github.com/allenday/R-Storm),它通过扩展ShellBolt类来实现R集成,以及一个R库来处理java和R之间的通信。
我的问题是,如果我基于常规(仅限Java)螺栓创建拓扑,我可以将它们链接在一起而不会出现问题。然而,当链条中间的一个螺栓是一个R Shell螺栓时,事情就会崩溃:
5661 [Thread-18] ERROR backtype.storm.util - Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Pipe to subprocess seems to be broken! No output read.
Shell Process Exception:
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.daemon.executor$fn__3557$fn__3569$fn__3616.invoke(executor.clj:715) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.util$async_loop$fn__436.invoke(util.clj:377) ~[storm-0.9.0-wip16.jar:na]
at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_25]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Pipe to subprocess seems to be broken! No output read.
更具体地说,以下拓扑按预期工作:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 1);
builder.setBolt("permutebolt", new PermuteBolt(), 1).shuffleGrouping("spout");
PermuteBolt是R Shell Bolt。此示例的日志显示预期输出:
6246 [Thread-18] INFO backtype.storm.daemon.task - Emitting: spout default [four score and seven years ago]
6246 [Thread-16] INFO backtype.storm.daemon.executor - Processing received message source: spout:3, stream: default, id: {}, [four score and seven years ago]
6261 [Thread-23] INFO backtype.storm.daemon.task - Emitting: permutebolt default ["PERMUTE seven years ago and four score"]
但是,如果我添加另一个从第一个获取数据的螺栓,例如:
builder.setBolt("permutebolt", new PermuteBolt(), 1).shuffleGrouping("spout");
builder.setBolt("identity", new IdentityBolt(new Fields("identity")), 1).fieldsGrouping("permutebolt", new Fields("permutation"));
上面打印的迹线失败了。另外,奇怪的是,第二个失败的例子包含在项目中。
这是以前任何人都面临的问题吗?
更新:我注意到这只发生在使用R Shell螺栓时,我已经尝试启动使用python脚本的螺栓,并且能够正常链接它们。
答案 0 :(得分:1)
@andrei,这是在今天上传到github的1.01中修复的: https://github.com/allenday/R-Storm/releases/tag/v1.01
它已提交给CRAN,很快就会上市。
感谢报道。
-Allen