Kafka Connect反序列化字节数组

时间:2018-03-29 11:14:41

标签: apache-kafka apache-kafka-connect confluent-kafka confluent

我试图在Kafka connect的帮助下接收字节数组序列化的Avro消息。 用于序列化avro数据的生产者配置

key.serializer-org.apache.kafka.common.serialization.ByteArraySerializer
value.serializer-org.apache.kafka.common.serialization.ByteArraySerializer

hdfs sink配置

name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=csvtopic
hdfs.url=hdfs://10.15.167.119:8020
flush.size=3
locale=en-us
timezone=UTC
partitioner.class=io.confluent.connect.hdfs.partitioner.HourlyPartitioner
format.class=io.confluent.connect.hdfs.parquet.ParquetFormat
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter.schema.registry.url=http://localhost:8081
hive.metastore.uris=thrift://10.15.167.119:9083
hive.integration=true
schema.compatibility=BACKWARD

如果我从hdfs quickstart-hdfs.properties中删除hive integration和format.class,我可以将数据保存到HDFS中。 启用hive集成后,我会收到以下异常堆栈跟踪

java.lang.RuntimeException: org.apache.kafka.connect.errors.SchemaProjectorException: Schema version required for BACKWARD compatibility
        at io.confluent.connect.hdfs.TopicPartitionWriter.write(TopicPartitionWriter.java:401)
        at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:374)
        at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:101)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:495)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:288)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:198)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:166)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

如何反序列化从Kafka主题收到的字节流并将其保存在配置单元中?

2 个答案:

答案 0 :(得分:1)

如果您将Avro与Schema Registry一起用于您的邮件,那么您应该使用AvroConverter而不是ByteArrayConverter,即:

key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081

答案 1 :(得分:1)

我查看了您的评论和代码。 您正在使用ByteArrayOutputStream进行编码,kafka-connect无法理解此类数据。而是使用以下发送数据的方式。

props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://localhost:8081");
KafkaProducer producer = new KafkaProducer(props);

发送数据时请使用

 GenericData.Record record = new GenericData.Record(User.getClassSchema());
 record.put("favorite_color", user.getFavoriteColor());
 record.put("favorite_number", user.getFavoriteNumber());
 record.put("name", user.getName());
 ProducerRecord<Object, Object> precord = new ProducerRecord<>("topic1",record);
 producer.send(precord);

在您的kafka连接配置中使用:

key.converter=io.confluent.connect.avro.AvroConverter
value.converter=io.confluent.connect.avro.AvroConverter