从python kafka库中推送错误时反序列化Kafka Topics数据

时间:2018-12-19 10:16:31

标签: java apache apache-kafka kafka-producer-api

我已经将接收器连接器设置为Postgres,连接到我的Kakfka集群的节点之一。设置如下:

  • 3个动物园管理员
  • 3个卡夫卡经纪人
  • 3个架构注册表
  • 1个Kafka Connect

我使用创建水槽

curl -X POST -H "Content-Type: application/json" \
  --data '{
    "name": "nextiot-sink",
    "config": {
        "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
        "connection.url": "jdbc:postgresql://db_host:5432/nextiot",
        "connection.user": "db_user",
        "connection.password": "db_pass",
        "auto.create": true,
        "auto.evolve": true,
        "topics": "nextiot"
        }
    }' http://10.0.1.70:8083/connectors

扑向架构注册表容器后,我能够从命令kafka-avro-console-producer产生和接收数据

但是当我尝试从“客户端”客户端发送数据时,我得到了这个信息:

  

{“ name”:“ nextiot-sink”,“ connector”:{“ state”:“ RUNNING”,“ worker_id”:“ 0.0.0.0:8083"},"tasks":[{"state”: “ FAILED”,“ trace”:“ org.apache.kafka.connect.errors.ConnectException:   错误处理程序中的公差超过\ n \ tat   org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)\ n \ tat   org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:513)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:490)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)\ n \ tat   org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)\ n \ tat   org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)\ n \ tat   java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511)\ n \ tat   java.util.concurrent.FutureTask.run(FutureTask.java:266)\ n \ tat   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\ n \ tat   java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)\ n \ tat   java.lang.Thread.run(Thread.java:748)\ n原因:   org.apache.kafka.connect.errors.DataException:nextiot \ n \ tat   io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:98)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.lambda $ convertAndTransformRecord $ 1(WorkerSinkTask.java:513)\ n \ tat   org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)\ n \ tat   org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)\ n \ t ...   另外13个\ n原因:   org.apache.kafka.common.errors.SerializationException:错误   ID -1的反序列化Avro消息\ n   org.apache.kafka.common.errors.SerializationException:未知魔术   字节!\ n“,” id“:0,” worker_id“:” 0.0.0.0:8083“}],” type“:” sink“}

这是我的AVRO架构

{ "namespace": "example.avro", "type": "record", "name": "NEXTIOT", "fields": [ {"name": "deviceid", "type": "string"}, {"name": "longitude", "type": "float"}, {"name": "latitude", "type": "float"} ] }

我发布数据的Python代码是:

import io
import random
import avro.schema
from avro.io import DatumWriter
from kafka import SimpleProducer
from kafka import KafkaClient

# To send messages synchronously
# KAFKA = KafkaClient('Broker URL:9092')
KAFKA = KafkaClient('BROKER URL')

PRODUCER = SimpleProducer(KAFKA)

# Kafka topic
TOPIC = "nextiot"

# Path to user.avsc avro schema
SCHEMA_PATH = "user.avsc"
SCHEMA = avro.schema.parse(open(SCHEMA_PATH).read())

for i in xrange(10):
    writer = DatumWriter(SCHEMA)
    bytes_writer = io.BytesIO()
    encoder = avro.io.BinaryEncoder(bytes_writer)
    writer.write(*{"deviceid":"9098", "latitude":  90.34 , "longitude": 334.4}, encoder)
    raw_bytes = bytes_writer.getvalue()
    PRODUCER.send_messages(TOPIC, raw_bytes)

我在Kafka Connect连接器中遇到以下错误:

  

{“ name”:“ nextiot-sink”,“ connector”:{“ state”:“ RUNNING”,“ worker_id”:“ 0.0.0.0:8083"},"tasks":[{"state”: “ FAILED”,“ trace”:“ org.apache.kafka.connect.errors.ConnectException:   错误处理程序中的公差超过\ n \ tat   org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)\ n \ tat   org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:513)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:490)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)\ n \ tat   org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)\ n \ tat   org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)\ n \ tat   java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511)\ n \ tat   java.util.concurrent.FutureTask.run(FutureTask.java:266)\ n \ tat   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\ n \ tat   java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)\ n \ tat   java.lang.Thread.run(Thread.java:748)\ n原因:   org.apache.kafka.connect.errors.DataException:nextiot \ n \ tat   io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:98)\ n \ tat   org.apache.kafka.connect.runtime.WorkerSinkTask.lambda $ convertAndTransformRecord $ 1(WorkerSinkTask.java:513)\ n \ tat   org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)\ n \ tat   org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)\ n \ t ...   另外13个\ n原因:   org.apache.kafka.common.errors.SerializationException:错误   ID -1的反序列化Avro消息\ n   org.apache.kafka.common.errors.SerializationException:未知魔术   字节!\ n“,” id“:0,” worker_id“:” 0.0.0.0:8083“}],” type“:” sink“}

1 个答案:

答案 0 :(得分:1)

我对各种python客户端没有做太多工作,但是魔术字节错误几乎可以肯定,因为您发送的内容可能是有效的avro,但是如果要与架构注册表集成,则有效负载必须位于不同格式(其他标头信息,在https://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html此处记录,搜索有线格式或魔术字节)。我个人将尝试使用confluent的python kafka客户端-https://github.com/confluentinc/confluent-kafka-python-它具有使用Avro和架构注册表的示例。