我使用storm + kafka + protobuf构建我的流处理系统。
问题是KafkaTridentSpoutOpaque反复消耗了最后一条消息。我只希望在卡夫卡中的每条消息中只有一位消费者。
以下是一些详细信息:
Java依赖性
storm-kafka-client 1.2.2
风暴核心1.2.2
kafka_2.10 0.10.2.0
组件
kafka_2.12-2.0.0
apache-storm-1.2.2
构建KafkaTridentSpoutOpaque实例代码
let connection = await oracledb.getConnection();
这是我的拓扑代码
protected static KafkaSpoutConfig<String, byte[]> newKafkaSpoutConfig(String bootstrapServers, String topic) {
KafkaSpoutConfig.Builder<String, byte[]> builder = new KafkaSpoutConfig.Builder<>(bootstrapServers, topic);
return builder.setProp(ConsumerConfig.GROUP_ID_CONFIG, "stormKafkaSpoutGroup")
.setProp(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000")
.setProp(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
.setProp(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer")
.setRecordTranslator(new JustValueFunc(), new Fields("str"))
.setFirstPollOffsetStrategy(UNCOMMITTED_EARLIEST)
.setProcessingGuarantee(AT_MOST_ONCE)
.build();
}
private static KafkaTridentSpoutOpaque<String, byte[]> newKafkaTridentSpoutOpaque(KafkaSpoutConfig<String, byte[]> spoutConfig) {
return new KafkaTridentSpoutOpaque<>(spoutConfig);
}
private static class JustValueFunc implements Func<ConsumerRecord<String, byte[]>, List<Object>>, Serializable {
@Override
public List<Object> apply(ConsumerRecord<String, byte[]> record) {
Values res = null;
try {
res = new Values(PbMiddlewareTransfer.Record.parseFrom(record.value()));
} catch (InvalidProtocolBufferException e) {
e.printStackTrace();
}
return res;
}
}
输出日志
public static void main(String[] args) throws Exception {
StormTopology topology = getTridentTopology();
Config conf = new Config();
conf.setNumWorkers(20);
conf.setMaxSpoutPending(5000);
StormSubmitter.submitTopology("storm-kafka-client-spout-test", conf, topology);
}
public static StormTopology getTridentTopology() {
final TridentTopology tridentTopology = new TridentTopology();
KafkaSpoutConfig<String, byte[]> spoutConfig = newKafkaSpoutConfig("192.168.0.202:9092", "test-2");
ITridentDataSource spout = newKafkaTridentSpoutOpaque(spoutConfig);
final Stream spoutStream = tridentTopology.newStream("spout", spout).parallelismHint(1);
spoutStream.each(spoutStream.getOutputFields(), new Debug("##### fastest driver"));
return tridentTopology.build();
}
我只在kafka中产生一条消息,它应该只有一个输出,但是确实有很多。并且大约每45分钟重复一次。
感谢您的帮助。
谢谢。
答案 0 :(得分:0)
将最大喷嘴未决值设置得很高会导致这种情况。尝试低调 值说1。
setMaxSpoutPending:设置未完成任务可以发出的最大数量。 there are some advice about how to set this option.