带有SchemRegistry的KafkaConnect HDFS连接器

时间:2018-03-12 05:55:03

标签: java hadoop apache-kafka apache-kafka-connect confluent-schema-registry

我参考了以下链接来了解HDaf Connect for Kafka https://docs.confluent.io/2.0.0/connect/connect-hdfs/docs/index.html
我可以通过hive集成将数据从kafka导出到HDFS 现在我试图借助Java程序将avro记录写入kafka

   if($arr1[$min] == $arr1[$max]){     
        $max--;
        return true;     
   }

Schema在Schema Registry中注册,名称为StreamExample_1

public static void main(String[] args) throws InterruptedException,IOException,RestClientException{

    Properties props = new Properties();
    props.put("bootstrap.servers", "localhost:9094");
    props.put("acks", "all");
    props.put("retries", 0);
    props.put("key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
    props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
    props.put("schema.registry.url", "http://10.15.167.109:8084");

    Producer<String, GenericRecord> producer = new KafkaProducer<String, GenericRecord>(props);

Schema schema= SchemaRegstryClient.getLatestSchema("StreamExample_1");
//    Random rnd = new Random();
    for (int i = 0; i < 1000; i++) {

      GenericRecord avroRecord = new GenericData.Record(schema);
       avroRecord.put("str1", i);
       avroRecord.put("str2",i+1);
      ProducerRecord<String, GenericRecord> data = new ProducerRecord<String, GenericRecord>(
          "StreamExample_1", ""+new Integer(i), avroRecord);
      producer.send(data);
         Thread.sleep(250);
    }

    producer.close();
  }

以下是我的hdfs属性文件

{
            "type": "record",
            "name": "StreamExample_1",
            "fields": [
                {
                    "name": "str1",
                    "type": "int",

                },
                {
                    "name": "str2",
                    "type": "int",

                }
               ]
        }

当我将Avro记录写入Kafka主题时,我在Connect

中收到以下错误
name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=StreamExample_1
hdfs.url=hdfs://localhost:9000
flush.size=3
hive.metastore.uris=thrift://10.15.167.109:9083
hive.integration=true
schema.compatibility=BACKWARD
format.class=io.confluent.connect.hdfs.parquet.ParquetFormat
partitioner.class=io.confluent.connect.hdfs.partitioner.HourlyPartitioner
locale=en-us
timezone=UTC
key.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schema.registry.url=http://localhost:8084
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8084

1 个答案:

答案 0 :(得分:2)

当您可以实际使用Avro对象时,不确定为什么您仍然在Producer中使用byte[]

此外,您没有发送任何密钥,因此不清楚为什么将值序列化程序设置为Avro密钥。我建议将循环中的整数设置为键。

props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
Producer<Integer, GenericRecord> producer = new KafkaProducer<Integer, GenericRecord>(props);

for (int i = 0; i < 1000; i++) {
    GenericData.Record avroRecord = new GenericData.Record(schema);
    avroRecord.put("str1", "Str 1-" + i);
    avroRecord.put("str2", "Str 2-" + i);
    avroRecord.put("int1", i);

    ProducerRecord<String, GenericRecord> data = new ProducerRecord<String, GenericRecord>("StreamExample_1", new Integer(i), avroRecord);
    producer.send(data);
}

producer.close();

参考Confluent example code

如果要将Kafka Connect与Avro数据一起使用,则需要将值转换器更新为

value.converter=io.confluent.connect.avro.AvroConverter