Apache Storm Trident和Kafka Spout Integration

时间:2018-04-09 15:22:29

标签: java parallel-processing apache-kafka apache-storm trident

我无法找到正确整合Kafka与Apache Storm Trident的良好文档。我试着在这里查看以前发布的相关问题,但没有足够的信息。

我想将Trident与Kafka连接为OpaqueTridentKafkaSpout。以下是目前正在运行的示例代码

GlobalPartitionInformation globalPartitionInformation  = new GlobalPartitionInformation(properties.getProperty("topic", "mytopic"));
Broker brokerForPartition0 = new Broker("IP1",9092);
Broker brokerForPartition1 = new Broker("IP2", 9092);
Broker brokerForPartition2 = new Broker("IP3:9092");

globalPartitionInformation.addPartition(0, brokerForPartition0);//mapping from partition 0 to brokerForPartition0
globalPartitionInformation.addPartition(1, brokerForPartition1);//mapping from partition 1 to brokerForPartition1
globalPartitionInformation.addPartition(2, brokerForPartition2);//mapping from partition 2 to brokerForPartition2
StaticHosts staticHosts = new StaticHosts(globalPartitionInformation);
TridentKafkaConfig tridentKafkaConfig = new TridentKafkaConfig(hosts,properties.getProperty("topic", "mytopic"));
tridentKafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
OpaqueTridentKafkaSpout kafkaSpout = new OpaqueTridentKafkaSpout(tridentKafkaConfig);

有了这个,我就可以为我的拓扑生成流,如下面的代码所示

TridentTopology topology = new TridentTopology();
Stream analyticsStream  = topology.newStream("spout", kafkaSpout).parallelismHint(Integer.valueOf(properties.getProperty("spout","6")))

虽然我提供了并行性和我的分区,但只有1个Kafka Spout的执行程序正在运行,因此我无法很好地扩展它。

任何人都可以指导我更好地将Apache Storm Trident(2.0.0)与Apache Kafka(1.0)集成在一起,每个节点都有3个节点集群吗?

此外,只要它从Kafka完成阅读,我就会不断获取这些日志

2018-04-09 14:17:34.119 o.a.s.k.KafkaUtils Thread-15-spout-spout-executor[79 79] [INFO] Metrics Tick: Not enough data to calculate spout lag.  2018-04-09 14:17:34.129 o.a.s.k.KafkaUtils Thread-21-spout-spout-executor[88 88] [INFO] Metrics Tick: Not enough data to calculate spout lag.

在Storm UI中,我可以看到上面的消息。有什么建议可以忽略公制标记吗?

1 个答案:

答案 0 :(得分:1)

无论如何,如果你使用的是Storm 2.0.0,我认为你应该切换到storm-kafka-client Trident喷口。 storm-kafka模块仅用于支持较旧的Kafka版本,因为正在删除基础Kafka API(SimpleConsumer)。新模块支持Kafka从0.10.0.0开始向前。

您可以在此处找到新喷口的示例三叉戟拓扑https://github.com/apache/storm/blob/master/examples/storm-kafka-client-examples/src/main/java/org/apache/storm/kafka/trident/TridentKafkaClientTopologyNamedTopics.java