我正在尝试设置HdfsSinkConnector。这是我的worker.properties配置:
bootstrap.servers=kafkacluster01.corp:9092
group.id=nycd-og-kafkacluster
config.storage.topic=hive_conn_conf
offset.storage.topic=hive_conn_offs
status.storage.topic=hive_conn_stat
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://my-schemaregistry.co:8081
schema.registry.url=http://my-schemaregistry.co:8081
hive.integration=true
hive.metastore.uris=dev-hive-metastore
schema.compatibility=BACKWARD
value.converter.schemas.enable=true
logs.dir = /logs
topics.dir = /topics
plugin.path=/usr/share/java
这是我打电话来设置连接器的帖子要求
curl -X POST localhost:9092/connectors -H "Content-Type: application/json" -d '{
"name":"hdfs-hive_sink_con_dom16",
"config":{
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"topics": "dom_topic",
"hdfs.url": "hdfs://hadoop-sql-dev:10000",
"flush.size": "3",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://my-schemaregistry.co:8081"
}
}'
主题dom_topic
已经存在(是Avro),但我的工作人员收到以下错误消息:
INFO Couldn't start HdfsSinkConnector: (io.confluent.connect.hdfs.HdfsSinkTask:72)
org.apache.kafka.connect.errors.ConnectException: java.io.IOException:
Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:
Protocol message end-group tag did not match expected tag.;
Host Details : local host is: "319dc5d70884/172.17.0.2"; destination host is: "hadoop-sql-dev":10000;
at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:202)
at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:64)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:207)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:139)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
我从配置单元中获取的hdfs.url:jdbc:hive2://hadoop-sql-dev:10000
如果我将端口更改为9092,我会得到
INFO Retrying connect to server: hadoop-sql-dev/xxx.xx.x.xx:9092. Already tried 0 time(s); maxRetries=45 (org.apache.hadoop.ipc.Client:837)
我全部在Docker上运行,我的Dockerfile非常简单
#FROM coinsmith/cp-kafka-connect-hdfs
FROM confluentinc/cp-kafka-connect:5.3.1
COPY confluentinc-kafka-connect-hdfs-5.3.1 /usr/share/java/kafka-connect-hdfs
COPY worker.properties worker.properties
# start
ENTRYPOINT ["connect-distributed", "worker.properties"]
任何帮助将不胜感激。