Question

我正在使用kafka和apache flink。我试图从apache flink中的kafka主题消费记录（以avro格式）。下面是我正在尝试的一段代码。

使用自定义反序列化器从主题中反序列化avro记录。

我发送给主题＆＃34; test-topic＆＃34;的数据的Avro架构如下所示。

{
  "namespace": "com.example.flink.avro",
  "type": "record",
  "name": "UserInfo",
  "fields": [
    {"name": "name", "type": "string"}
  ]
}

我使用的自定义反序列化器如下所示。

public class AvroDeserializationSchema<T> implements DeserializationSchema<T> {

    private static final long serialVersionUID = 1L;

    private final Class<T> avroType;

    private transient DatumReader<T> reader;
    private transient BinaryDecoder decoder;

    public AvroDeserializationSchema(Class<T> avroType) {
        this.avroType = avroType;
    }


    public T deserialize(byte[] message) {
        ensureInitialized();
        try {
            decoder = DecoderFactory.get().binaryDecoder(message, decoder);
            T t = reader.read(null, decoder);
            return t;
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    }

    private void ensureInitialized() {
        if (reader == null) {
            if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(avroType)) {
                reader = new SpecificDatumReader<T>(avroType);
            } else {
                reader = new ReflectDatumReader<T>(avroType);
            }
        }
    }


    public boolean isEndOfStream(T nextElement) {
        return false;
    }


    public TypeInformation<T> getProducedType() {
        return TypeExtractor.getForClass(avroType);
    }
}

这就是我的flink应用程序的编写方式。

public class FlinkKafkaApp {


    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties kafkaProperties = new Properties();
        kafkaProperties.put("bootstrap.servers", "localhost:9092");
        kafkaProperties.put("group.id", "test");

        AvroDeserializationSchema<UserInfo> schema = new AvroDeserializationSchema<UserInfo>(UserInfo.class);

        FlinkKafkaConsumer011<UserInfo> consumer = new FlinkKafkaConsumer011<UserInfo>("test-topic", schema, kafkaProperties);

        DataStreamSource<UserInfo> userStream = env.addSource(consumer);

        userStream.map(new MapFunction<UserInfo, UserInfo>() {

            @Override
            public UserInfo map(UserInfo userInfo) {
                return userInfo;
            }
        }).print();

        env.execute("Test Kafka");

    }

我正在尝试打印发送到主题的记录，如下所示。 {＆＃34;名称＆＃34; ：＆＃34; SUMIT＆＃34;}

输出：

我得到的输出是 {＆＃34;名称＆＃34;：＆＃34;＆＃34;}

任何人都可以帮忙弄清楚这里的问题是什么，为什么我没有得到{＆＃34; name＆＃34; ：＆＃34; sumit＆＃34;}作为输出。

Answer 1

Flink文档说： Flink的Kafka使用者称为FlinkKafkaConsumer08（对于Kafka 0.9.0.x版本，则称为09，等等；对于Kafka> = 1.0.0版本，则仅称为FlinkKafkaConsumer）。它提供对一个或多个Kafka主题的访问。

我们不必编写自定义反序列化器即可使用来自Kafka的Avro消息。

-要读取SpecificRecords：

DataStreamSource<UserInfo> stream = streamExecutionEnvironment.addSource(new FlinkKafkaConsumer<>("test_topic", AvroDeserializationSchema.forSpecific(UserInfo.class), properties).setStartFromEarliest());

要读取GenericRecords：

Schema schema = Schema.parse("{"namespace": "com.example.flink.avro","type": "record","name": "UserInfo","fields": [{"name": "name", "type": "string"}]}");
DataStreamSource<GenericRecord> stream = streamExecutionEnvironment.addSource(new FlinkKafkaConsumer<>("test_topic", AvroDeserializationSchema.forGeneric(schema), properties).setStartFromEarliest());

有关更多详细信息：https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumer

鞭打中的卡夫卡消费者

1 个答案: