如何使用Kafka Connect配置HdfsSinkConnector?

时间:2019-11-19 17:15:00

标签: hadoop hive apache-kafka hdfs apache-kafka-connect

我正在尝试设置HdfsSinkConnector。这是我的worker.properties配置:

bootstrap.servers=kafkacluster01.corp:9092
group.id=nycd-og-kafkacluster

config.storage.topic=hive_conn_conf
offset.storage.topic=hive_conn_offs
status.storage.topic=hive_conn_stat

key.converter=org.apache.kafka.connect.storage.StringConverter

value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://my-schemaregistry.co:8081

schema.registry.url=http://my-schemaregistry.co:8081

hive.integration=true
hive.metastore.uris=dev-hive-metastore
schema.compatibility=BACKWARD

value.converter.schemas.enable=true
logs.dir = /logs
topics.dir = /topics

plugin.path=/usr/share/java

这是我打电话来设置连接器的帖子要求

curl -X POST localhost:9092/connectors -H "Content-Type: application/json" -d '{
  "name":"hdfs-hive_sink_con_dom16",
  "config":{
    "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
    "topics": "dom_topic",
    "hdfs.url": "hdfs://hadoop-sql-dev:10000",
    "flush.size": "3",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url":"http://my-schemaregistry.co:8081"
    }
}'    

主题dom_topic已经存在(是Avro),但我的工作人员收到以下错误消息:

INFO Couldn't start HdfsSinkConnector: (io.confluent.connect.hdfs.HdfsSinkTask:72)
org.apache.kafka.connect.errors.ConnectException: java.io.IOException: 
Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: 
Protocol message end-group tag did not match expected tag.; 
Host Details : local host is: "319dc5d70884/172.17.0.2"; destination host is: "hadoop-sql-dev":10000;
        at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:202)
        at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:64)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:207)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:139)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

我从配置单元中获取的hdfs.url:jdbc:hive2://hadoop-sql-dev:10000

如果我将端口更改为9092,我会得到

INFO Retrying connect to server: hadoop-sql-dev/xxx.xx.x.xx:9092. Already tried 0 time(s); maxRetries=45 (org.apache.hadoop.ipc.Client:837)

我全部在Docker上运行,我的Dockerfile非常简单

#FROM coinsmith/cp-kafka-connect-hdfs
FROM confluentinc/cp-kafka-connect:5.3.1

COPY confluentinc-kafka-connect-hdfs-5.3.1 /usr/share/java/kafka-connect-hdfs
COPY worker.properties worker.properties

# start 
ENTRYPOINT ["connect-distributed", "worker.properties"]

任何帮助将不胜感激。

0 个答案:

没有答案