我们运行Kafka已有大约一年了。大约一个月前,我们开始遇到一个问题,其中一个生产者使用Java客户端在尝试提交时收到InvalidTxnStateException。从那时起,代理将忽略该生产者,直到重新启动代理为止。忽略,是指生产者回滚并调用initTransaction时,它会无限期地超时。 (直到重新启动代理为止),然后,在重新启动代理之后,有时会看到重复的消息。任何帮助或见解将不胜感激,即使这只是有关如何调试此类问题的指导。我们正在考虑周一升级到Kafka 2.0,但我担心这个问题还会持续。
我们当前的系统:
生产者配置:
props.put("bootstrap.servers", bootstrapServers);
props.put("transactional.id", transactionId); // hostname of instance
props.put("compression.type", "gzip");
props.put("max.block.ms", 10000);
props.put("enable.idempotence", "true");
props.put("linger.ms", 500);
props.put("batch.size",1048576);
生产者代码亮点:
// This gets called once at start up
KafkaProducer kafkaProducer = new KafkaProducer<>(this.properties, new StringSerializer(), new StringSerializer());
kafkaProducer.initTransactions();
// This, gets called repeatedly as files get written into a directory...
FileReader fr = null;
BufferedReader br = null;
try {
fr = new FileReader(logFile.getAbsolutePath());
br = new BufferedReader(fr);
kafkaProducer.beginTransaction();
String line;
while ((line = br.readLine()) != null) {
kafkaProducer.send(new ProducerRecord<String, String>(this.topic,null,epoch,key,line)); // key is host name + rand int
}
this.kafkaProducer.commitTransaction();
} catch (IOException e) {
System.out.println("Could not open file!");
e.printStackTrace();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
// Guides say to catch exception but in testing exception came about by "caused by"
System.out.println("Unrecoverable exception!");
e.printStackTrace();
kafkaProducer.close();
kafkaProducer = null;
kafkaProducer = createKafkaProducer();
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
e.printStackTrace();
System.out.println("Kafka exception: aborting!");
if (e.getCause() != null && (e.getCause() instanceof ProducerFencedException || e.getCause() instanceof OutOfOrderSequenceException || e.getCause() instanceof AuthorizationException || e.getCause() instanceof InvalidTxnStateException)) {
System.out.println("Unrecoverable exception!");
kafkaProducer.close();
kafkaProducer = null;
kafkaProducer = createKafkaProducer();
} else {
kafkaProducer.abortTransaction();
}
}
生产者错误:
org.apache.kafka.common.errors.InvalidTxnStateException:生产者尝试以无效状态进行事务操作
经纪人日志:
[2018-09-21 08:19:05,037]信息删除索引/tmp/kafka-logs/logs/prod_tx0-63/00000000000870963261.timeindex.deleted(kafka.log.TimeIndex)
[2018-09-21 08:19:16,071]错误[ReplicaManager broker = 3]处理分区prod_tx0-58(kafka.server.ReplicaManager)上的追加操作时出错 org.apache.kafka.common.errors.OutOfOrderSequenceException:生产者编号94001的顺序顺序编号:233824(传入序列号),228787(当前结束序列号)
[2018-09-21 08:19:20,808]错误[ReplicaManager broker = 3]处理分区prod_tx0-11(kafka.server.ReplicaManager)上的追加操作时出错 org.apache.kafka.common.errors.OutOfOrderSequenceException:ProducerId 90031的乱序编号:448132(传入序号),443075(当前结束序号)
[2018-09-21 08:21:14,656]错误[ReplicaManager broker = 3]处理分区prod_tx0-61(kafka.server.ReplicaManager)上的追加操作时出错 org.apache.kafka.common.errors.OutOfOrderSequenceException:ProducerId 94006的乱序序列号:599366(传入序列号),594332(当前结束序列号)
[2018-09-21 08:22:00,222]信息在0毫秒内滚动了'prod_tx0-44'的新日志段。 (kafka.log.Log)
[2018-09-21 08:22:07,055]错误[ReplicaManager broker = 3]处理分区prod_tx0-49(kafka.server.ReplicaManager)上的追加操作时出错 org.apache.kafka.common.errors.OutOfOrderSequenceException:ProducerId 93006的乱序编号:213516(传入序号),208514(当前结束序号)
[2018-09-21 08:22:18,365]信息将dir / tmp / kafka-logs / logs(kafka.log.Log)中的prod_tx0-28分区的日志起始偏移量递增到870795767
注意:
再次感谢您抽出宝贵的时间阅读。