我面临着非常奇怪的问题。我使用Pig进行多列数据处理。 Pig使用HCatalogLoader在pig脚本中加载数据。这些列包含多个整数数据,字符串数据以及双数据。其中一个整数类型的列(例如C1)无法使用ParquetStorer存储。其他整数列没有问题,只有C1列存储失败。
以下是错误:
Backend error message
---------------------
AttemptID:attempt_1413268228935_0073_m_000002_0 Info:Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Backend error message
---------------------
AttemptID:attempt_1413268228935_0073_m_000001_0 Info:Error: parquet.io.ParquetEncodingException: can not write value at 2 in tuple (,2003-11-22,840,00007,ABC,DEF,FFGG,10,0.0,0,0.0,11.11,0,7.122112,0.0,0,0.0) from type 'C1: int' to type 'optional int32 C1'
at parquet.pig.TupleWriteSupport.writeValue(TupleWriteSupport.java:199)
at parquet.pig.TupleWriteSupport.writeTuple(TupleWriteSupport.java:151)
at parquet.pig.TupleWriteSupport.write(TupleWriteSupport.java:90)
at parquet.pig.TupleWriteSupport.write(TupleWriteSupport.java:46)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:111)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:78)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:35)
at parquet.pig.ParquetStorer.putNext(ParquetStorer.java:121)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at parquet.pig.TupleWriteSupport.writeValue(TupleWriteSupport.java:178)
... 24 more
我已经描述了使用ParquetStorer存储数据的别名,列C1的类型为int。仍然ParquetStorer抱怨数据是字符串类型,并且无法输入将其强制转换为Number。
任何帮助表示感谢。
答案 0 :(得分:0)
我有类似的问题,我的解决方法只是将字段转换为chararray,然后我能够以拼花格式保存输出。
这是函数btw的源代码, http://grepcode.com/file/repo1.maven.org/maven2/com.twitter/parquet-pig/1.2.0/parquet/pig/TupleWriteSupport.java
对我来说似乎很好,但听起来这可能是这种情况下的错误:
case INT32:
recordConsumer.addInteger(((Number)t.get(i)).intValue());
break;
t.get(i)返回一个String,因此返回
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at parquet.pig.TupleWriteSupport.writeValue(TupleWriteSupport.java:178)
... 24 more