Mapper无法读取gz.parquet文件

时间:2017-04-21 01:56:35

标签: hadoop mapreduce parquet

org.apache.hadoop.mapred.MapTask:开始刷新地图输出

  

2017-04-20 20:53:20,101 WARN [main] org.apache.hadoop.mapred.YarnChild:异常运行的子:java.lang.NullPointerException       在org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:294)       在org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:204)       在org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:198)       在org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:105)       在org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:174)       在org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:192)       在org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)       在org.apache.hadoop.mapreduce.lib.input.DelegatingRecordReader.initialize(DelegatingRecordReader.java:84)       at org.apache.hadoop.mapred.MapTask $ NewTrackingRecordReader.initialize(MapTask.java:548)       在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)       在org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)       在org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:163)       at java.security.AccessController.doPrivileged(Native Method)       在javax.security.auth.Subject.doAs(Subject.java:415)       在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)       在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

我正在尝试在mapper中读取filename.gz.paquet,使用相同的mapreduce作业,我能够读取filename.snappy.parquet文件。

1 个答案:

答案 0 :(得分:0)

我可以解决这个问题。 好吧,我已经更新到avroVersion 1.8.1的新罐子, parquetVersion 1.9.0,parquetFormatVersion 2.3.1,and hiveVersion 1.2.2。