问题陈述: 我想从两个不同的Kafka Spouts(比如S1和S2)加入两个Streams,并希望根据其中的一些常见字段加入每个流的元组。 如果“S1”在json下面作为元组收到
{"l7ProtocolID":"dhcp",
"packets_out":1,
"bytes_out":400,
"start_time":1454281199898,
"flow_sample":0,
"duration":102,
"path":["base","ip","udp","dhcp"],
"bytes_in":1200,
"l4":[{"client":"68","server":"67","level":0}],
"l2":[{"client":"52:54:00:50:04:B2","server":"FF:FF:FF:FF:FF:FF","level":0}],
"l3":[{"client":"::ffff:0.0.0.0","server":"::ffff:255.255.255.255","level":0}],
"flow_id":"81454281200000731489",
"applicationID":"dhcp",
"packets_in":1}
和“S2”接收JSON下面的元组
{"portGroupName":"dhcp",
"hypervisorName":1,
"bytes_out":400,
"monitoredIP":1454281199898,
"monitoredInstance":0,
"duration":102,
"bytes_in":1200,
"flow_id":"81454281200000731489",
"tenant":1}
我想基于一个共同的字段加入两者,在这里说“flow_id”。 建议示例或方法。与.fieldsGrouping混淆,这是我的用例的解决方案。
答案 0 :(得分:0)
您可以使用Tident API进行连接:
TridentTopology topology = new TridentTopology();
// do some stuff here
topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new Fields("key", "a", "b", "c"));
有关详细信息,请参阅文档:https://storm.apache.org/releases/1.0.0/Trident-API-Overview.html
如果你想使用低级API,使用fieldsGrouping
是正确的(当然,你需要考虑"窗口"你自己)
这样的事情:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout1",...);
builder.setSpout("spout2",...);
builder.setSpout("join",...)
.fieldsGrouping("spout1", new Fields("flow_id"))
.fieldsGrouping("spout2", new Fields("flow_id"));