如何从Kafka Connect SourceTask

时间:2016-05-19 08:15:45

标签: serialization schema apache-kafka avro

我已经设置了Confluence数据平台并开始开发SourceConnector,并在相应的SourceTask.poll()方法中执行以下操作(下面的伪Java代码):

    public List<SourceRecord> poll() throws InterruptedException {

....

    Envelope envelope = new Envelope();
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    Encoder enc = EncoderFactory.get().binaryEncoder(out, null);
    DatumWriter<Envelope> dw = new ReflectDatumWriter<Envelope>(Envelope.class);
    dw.write((Envelope)envelope, enc);
    enc.flush();
    out.close();
    Map<String, String> sourcePartition = new HashMap<String, String>();
    sourcePartition.put("stream", streamName);
    Map<String, Integer> sourceOffset = new HashMap<String, Integer>();
    sourceOffset.put("position", Integer.parseInt(envelope.getTimestamp()));
    records.add(new SourceRecord(sourcePartition, sourceOffset, topic, org.apache.kafka.connect.data.Schema.BYTES_SCHEMA, envelope));

....

我想使用模式注册表,以便被序列化的对象用注册表中的模式ID标记,序列化,然后通过poll()函数发布到Kafka主题。如果任意对象的模式没有驻留在注册表中,我希望它被注册,并且相应的生成的id返回到序列化程序进程,因此它成为序列化对象的一部分,使其可以反序列化。

我需要在上面的代码中做些什么来实现这一目标?

3 个答案:

答案 0 :(得分:3)

要使用SchemaRegistry,您必须使用Confluent提供的类序列化/反序列化您的数据:

  • io.confluent.kafka.serializers.KafkaAvroSerializer
  • io.confluent.kafka.serializers.KafkaAvroDeserializer

这些类包含从注册表注册和请求模式的所有逻辑。

如果您使用maven,则可以添加此依赖项:

<dependency>
  <groupId>io.confluent</groupId>
  <artifactId>kafka-avro-serializer</artifactId>
  <version>2.0.1</version>
</dependency>

答案 1 :(得分:2)

结帐https://gist.github.com/avpatel257/0a88d20200661b31ab5f5df7adc42e6f,例如实施。

您需要来自汇合的以下依赖项才能使其正常工作。

    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>common-config</artifactId>
        <version>3.0.0</version>
    </dependency>
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>common-utils</artifactId>
        <version>3.0.0</version>
    </dependency>
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>kafka-schema-registry-client</artifactId>
        <version>3.0.0</version>
    </dependency>
    <dependency>
        <groupId>io.confluent</groupId>
        <artifactId>kafka-avro-serializer</artifactId>
        <version>3.0.0</version>
    </dependency>

答案 2 :(得分:0)

根据documentation

在POM中:

<dependency>
    <groupId>io.confluent</groupId>
    <artifactId>kafka-avro-serializer</artifactId>
    <version>3.3.1</version>
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.11</artifactId>
    <version>0.11.0.1-cp1</version>
    <scope>provided</scope>
</dependency>

在应用程序中,创建生产者:

Properties props = new Properties();
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
          io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://localhost:8081");
// Set any other properties
KafkaProducer producer = new KafkaProducer(props);

使用制作人:

User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
Future<RecordAndMetadata> resultFuture = producer.send(user1);

在您的注册表中,对于此示例,您需要“用户”的架构。

汇编也有一个nice example in Github

package io.confluent.examples.producer;

import JavaSessionize.avro.LogLine;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;
import java.util.Random;

public class AvroClicksProducer {

    public static void main(String[] args) throws InterruptedException {
        if (args.length != 1) {
            System.out.println("Please provide command line arguments: schemaRegistryUrl");
            System.exit(-1);
        }

        String schemaUrl = args[0];

        Properties props = new Properties();
        // hardcoding the Kafka server URI for this example
        props.put("bootstrap.servers", "localhost:9092");
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
        props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
        props.put("schema.registry.url", schemaUrl);

        // Hard coding topic too.
        String topic = "clicks";

        // Hard coding wait between events so demo experience will be uniformly nice
        int wait = 500;

        Producer<String, LogLine> producer = new KafkaProducer<String, LogLine>(props);

        // We keep producing new events and waiting between them until someone ctrl-c
        while (true) {
            LogLine event = EventGenerator.getNext();
            System.out.println("Generated event " + event.toString());

            // Using IP as key, so events from same IP will go to same partition
            ProducerRecord<String, LogLine> record = new ProducerRecord<String, LogLine>(topic, event.getIp().toString(), event);
            producer.send(record);
            Thread.sleep(wait);
        }
    }
}