flume1.sources = kafka-source-1
flume1.channels = hdfs-channel-1
flume1.sinks = hdfs-sink-1
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.kafka.bootstrap.servers= kafka bootstrap servers
flume1.sources.kafka-source-1.kafka.topics = topic
flume1.sources.kafka-source-1.batchSize = 1000
flume1.sources.kafka-source-1.channels = hdfs-channel-1
flume1.sources.kafka-source-1.kafka.consumer.group.id=group_id
flume1.channels.hdfs-channel-1.type = memory
flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = file_prefix
flume1.sinks.hdfs-sink-1.hdfs.fileSuffix = .avro
flume1.sinks.hdfs-sink-1.hdfs.inUsePrefix = tmp/
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = /user/directory/ingest_date=%y-%m-%d
flume1.sinks.hdfs-sink-1.hdfs.rollCount=0
flume1.sinks.hdfs-sink-1.hdfs.rollSize=1000000
flume1.channels.hdfs-channel-1.capacity = 10000
flume1.channels.hdfs-channel-1.transactionCapacity = 10000
我正在使用水槽从卡夫卡消费avro数据,并使用HDFS接收器将其存储在HDFS中。
我正在尝试在HDFS中的avro数据上创建一个配置单元表。
我可以进行msck修复表,并且可以看到将分区添加到配置单元metastore
所以当我从表名限制1中选择*时;它获取了我的记录
但是当我尝试获取比我更多的东西
失败,但出现异常java.io.IOException:org.apache.avro.AvroRuntimeException:java.io.IOException:该实现的块大小无效或太大:-40
我也尝试过提供以下道具
flume1.sinks.hdfs-sink-1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
flume1.sinks.hdfs-sink-1.deserializer.schemaType = LITERAL
flume1.sinks.hdfs-sink-1.serializer.schemaURL = file:///schemadirectory
P.S我使用kafka connect将数据推送到Kafka主题。