无法从Flume HDFS Sink创建的Avro数据中获取记录

时间:2018-07-10 15:50:36

标签: apache-kafka avro flume

flume1.sources = kafka-source-1
flume1.channels = hdfs-channel-1
flume1.sinks = hdfs-sink-1
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.kafka.bootstrap.servers= kafka bootstrap servers
flume1.sources.kafka-source-1.kafka.topics = topic
flume1.sources.kafka-source-1.batchSize = 1000
flume1.sources.kafka-source-1.channels = hdfs-channel-1
flume1.sources.kafka-source-1.kafka.consumer.group.id=group_id

flume1.channels.hdfs-channel-1.type = memory
flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = file_prefix
flume1.sinks.hdfs-sink-1.hdfs.fileSuffix = .avro
flume1.sinks.hdfs-sink-1.hdfs.inUsePrefix = tmp/
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = /user/directory/ingest_date=%y-%m-%d
flume1.sinks.hdfs-sink-1.hdfs.rollCount=0
flume1.sinks.hdfs-sink-1.hdfs.rollSize=1000000
flume1.channels.hdfs-channel-1.capacity = 10000
flume1.channels.hdfs-channel-1.transactionCapacity = 10000

我正在使用水槽从卡夫卡消费avro数据,并使用HDFS接收器将其存储在HDFS中。

我正在尝试在HDFS中的avro数据上创建一个配置单元表。

我可以进行msck修复表,并且可以看到将分区添加到配置单元metastore

所以当我从表名限制1中选择*时;它获取了我的记录

但是当我尝试获取比我更多的东西

失败,但出现异常java.io.IOException:org.apache.avro.AvroRuntimeException:java.io.IOException:该实现的块大小无效或太大:-40

我也尝试过提供以下道具

flume1.sinks.hdfs-sink-1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
flume1.sinks.hdfs-sink-1.deserializer.schemaType = LITERAL
flume1.sinks.hdfs-sink-1.serializer.schemaURL = file:///schemadirectory

P.S我使用kafka connect将数据推送到Kafka主题。

0 个答案:

没有答案