从hive打开gzip压缩文件正在返回"不是数据文件"

时间:2017-08-29 15:06:27

标签: amazon-s3 hive gzip flume flume-ng

下面是我的配置,非常适用于非压缩数据

agent.sinks.test.type = hdfs
agent.sinks.test.hdfs.useLocalTimeStamp = true
agent.sinks.test.hdfs.path = s3n://AccessKeys@test/%{topic}/utc=%s
agent.sinks.test.hdfs.roundUnit = minute
agent.sinks.test.hdfs.round = true
agent.sinks.test.hdfs.roundValue = 10
agent.sinks.test.hdfs.fileSuffix = .avro
agent.sinks.test.serializer = 
com.test.flume.sink.serializer.GenericRecordAvroEventSerializer$Builder
agent.sinks.test.hdfs.fileType = DataStream
agent.sinks.test.hdfs.maxOpenFiles=100
agent.sinks.test.hdfs.appendTimeout = 5000
agent.sinks.test.hdfs.callTimeout = 4000
agent.sinks.test.hdfs.rollInterval = 60
agent.sinks.test.hdfs.rollSize = 0 
agent.sinks.test.hdfs.rollCount = 1000
agent.sinks.test.hdfs.batchSize = 1000
agent.sinks.test.hdfs.threadsPoolSize=100

我正在尝试使用gzip添加压缩,如下所示

agent.sinks.test.type = hdfs
agent.sinks.test.hdfs.useLocalTimeStamp = true
agent.sinks.test.hdfs.path = s3n://AccessKeys@test/%{topic}/utc=%s
agent.sinks.test.hdfs.roundUnit = minute
agent.sinks.test.hdfs.round = true
agent.sinks.test.hdfs.roundValue = 10
agent.sinks.test.hdfs.fileSuffix = .avro
agent.sinks.test.serializer = 
com.test.flume.sink.serializer.GenericRecordAvroEventSerializer$Builder
agent.sinks.test.hdfs.fileType = CompressedStream
agent.sinks.test.hdfs.codeC = gzip
agent.sinks.test.hdfs.maxOpenFiles=100
agent.sinks.test.hdfs.appendTimeout = 10000
agent.sinks.test.hdfs.callTimeout = 4000
agent.sinks.test.hdfs.rollInterval = 60
agent.sinks.test.hdfs.rollSize = 0
agent.sinks.test.hdfs.rollCount = 1000
agent.sinks.test.hdfs.batchSize = 1000
agent.sinks.test.hdfs.threadsPoolSize=100

以上所有数据都存储在s3中,当我尝试从hive检索数据时,我收到以下错误。 线程" main"中的例外情况java.io.IOException:不是Avro数据文件

您能否告诉我为什么我的配置不起作用?

0 个答案:

没有答案