KafkaTridentSpoutOpaque重复消费最后一条消息

时间:2018-11-20 03:56:31

标签: java apache-kafka apache-storm trident

我使用storm + kafka + protobuf构建我的流处理系统。

问题是KafkaTridentSpoutOpaque反复消耗了最后一条消息。我只希望在卡夫卡中的每条消息中只有一位消费者。

以下是一些详细信息:

Java依赖性

  

storm-kafka-client 1.2.2

     

风暴核心1.2.2

     

kafka_2.10 0.10.2.0

组件

  

kafka_2.12-2.0.0

     

apache-storm-1.2.2

构建KafkaTridentSpoutOpaque实例代码

let connection = await oracledb.getConnection();

这是我的拓扑代码

protected static KafkaSpoutConfig<String, byte[]> newKafkaSpoutConfig(String bootstrapServers, String topic) {
        KafkaSpoutConfig.Builder<String, byte[]> builder = new KafkaSpoutConfig.Builder<>(bootstrapServers, topic);
        return builder.setProp(ConsumerConfig.GROUP_ID_CONFIG, "stormKafkaSpoutGroup")
                .setProp(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "30000")
                .setProp(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
                .setProp(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer")
                .setRecordTranslator(new JustValueFunc(), new Fields("str"))
                .setFirstPollOffsetStrategy(UNCOMMITTED_EARLIEST)
                .setProcessingGuarantee(AT_MOST_ONCE)
                .build();
    }
    private static KafkaTridentSpoutOpaque<String, byte[]> newKafkaTridentSpoutOpaque(KafkaSpoutConfig<String, byte[]> spoutConfig) {
        return new KafkaTridentSpoutOpaque<>(spoutConfig);
    }
    private static class JustValueFunc implements Func<ConsumerRecord<String, byte[]>, List<Object>>, Serializable {
        @Override
        public List<Object> apply(ConsumerRecord<String, byte[]> record) {
            Values res = null;
            try {
                res = new Values(PbMiddlewareTransfer.Record.parseFrom(record.value()));
            } catch (InvalidProtocolBufferException e) {
                e.printStackTrace();
            }
            return res;
        }
    }

输出日志

public static void main(String[] args) throws Exception {
        StormTopology topology = getTridentTopology();
        Config conf = new Config();
        conf.setNumWorkers(20);
        conf.setMaxSpoutPending(5000);
        StormSubmitter.submitTopology("storm-kafka-client-spout-test", conf, topology);
    }

    public static StormTopology getTridentTopology() {
        final TridentTopology tridentTopology = new TridentTopology();

        KafkaSpoutConfig<String, byte[]> spoutConfig = newKafkaSpoutConfig("192.168.0.202:9092", "test-2");
        ITridentDataSource spout = newKafkaTridentSpoutOpaque(spoutConfig);

        final Stream spoutStream = tridentTopology.newStream("spout", spout).parallelismHint(1);

        spoutStream.each(spoutStream.getOutputFields(), new Debug("##### fastest driver"));

        return tridentTopology.build();
    }

我只在kafka中产生一条消息,它应该只有一个输出,但是确实有很多。并且大约每45分钟重复一次。

感谢您的帮助。

谢谢。

1 个答案:

答案 0 :(得分:0)

将最大喷嘴未决值设置得很高会导致这种情况。尝试低调 值说1。

setMaxSpoutPending:设置未完成任务可以发出的最大数量。 there are some advice about how to set this option.

相关问题