Question

我尝试执行以下操作：

public List<SourceRecord> poll() throws InterruptedException {
  List<SourceRecord> records = new ArrayList<>();

  JSONArray jsonRecords = getRecords(0, 3);

  for (Object jsonRecord: jsonRecords) {
   JSONObject j = new JSONObject(jsonRecord.toString());

   Map sourceOffset = Collections.singletonMap("block", j.get("block").toString());
   Object value = j.get("data").toString();

   records.add(new SourceRecord(
    Collections.singletonMap("samesourcepartition", "samesourcepartition"), // sourcePartition
    sourceOffset, // sourceOffset
    "mytopic", // topic
    Schema.STRING_SCHEMA, // keySchema
    j.get("block").toString, // key: "0", "1", "2", "3"
    Schema.STRING_SCHEMA, // valueSchema
    value // value
   ));

   log.info("added record for block: " + j.get("block"));
  }

  log.info("Returning {} records", records.size());

  return records;
}

我对如何使用sourceOffset。（https://docs.confluent.io/current/connect/devguide.html#task-example-source-task）感到困惑

block的示例可以是"3"。我希望情况是这样的：如果Kafka已经阅读过此sourceOffset，则不应再次阅读。但似乎完全忽略了这一点，offset继续增长到3以上，并在无限循环中重复重复相同的0-3数据。例如，如果我查看Confluent仪表板>主题>检查，我期望记录的最高offset和key为“ 3”，但是超过100+，且键和值重复。

我的poll（）是否需要将0-> 3递增，以便知道何时“停止”？当前行为不断重复0-> 3、0-> 3，...以添加new SourceRecord()，但我想使用sourceOffset并且唯一的key应该是幂等的。

我确定我误会了一些东西。我也尝试打开log compaction，但即使使用相同的键，仍然会得到重复。有人可以显示适当的用法来按sourceOffset / key发送消息吗？

设置sourceOffset，唯一键，日志压缩后，Kafka Connect在主题中复制消息

0 个答案: