Question

我尝试使用Debezium与Kafka的MySQL连接器捕获MySQL数据更改，然后最后通过HDFS Sink连接器将更改写入Hadoop上的Hive。管道就像：MySQL-> Kafka-> Hive。

接收器连接器的配置如以下屏幕截图所示。

{
  "name": "hdfs-sink",
  "config": {
    "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
    "tasks.max": "1",
    "topics": "customers",
    "hdfs.url": "hdfs://192.168.10.15:8020",
    "flush.size": "3",
    "hive.integration": "true",
    "hive.database":"inventory",
    "hive.metastore.uris":"thrift://192.168.10.14:9083",
    "schema.compatibility":"BACKWARD",
    "transforms": "unwrap,key",
    "transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
    "transforms.unwrap.drop.tombstones": "false",
    "transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
    "transforms.key.field": "id"
  }
}

这似乎可行，但是查询Hive表，然后看到更改的数据（用after键包裹的数据显示在after列中，而不是将数据定界到原始表列中。

这是查询结果scrrenshot。

正如您在接收器配置中看到的那样，我已经尝试使用Debezium的"io.debezium.transforms.UnwrapFromEnvelope"运算符来展开事件消息，但是显然它不起作用。

可以让我编写从Kafka到Hive的数据库更改事件的最小设置是什么？ HDFS Sink连接器是这项工作的正确选择吗？

更新：我使用Debezium数据库中的示例“库存”数据库对此进行了测试。我从Debezium图像获得了测试环境，因此它们应该是最新的。这里的一些版本信息：debezium 1.0，kafka 2.0，confluent-kafka-connect-hdfs接收器连接器：5.4.1。

更新2 ：我继续使用以下接收器配置，但仍然没有运气：

{
  "name": "hdfs-sink",
  "config": {
    "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
    "tasks.max": "1",
    "topics": "dbserver1.inventory.customers",
    "hdfs.url": "hdfs://172.17.0.8:8020",
    "flush.size": "3",
    "hive.integration": "true",
    "hive.database":"inventory",
    "hive.metastore.uris":"thrift://172.17.0.8:9083",
    "schema.compatibility":"BACKWARD",
    "transforms": "unwrap",
    "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
    "transforms.unwrap.drop.tombstones": "false"
  }
}

通过HDFS接收器连接器将Debezium Kafka主题数据写入Hive

0 个答案: