我收到152³305746
作为输入文件中的一些错误数据。我试图过滤它,但没有成功告诉蜂巢如何检测和过滤。我甚至不确定它是什么数据类型。我期待bigint
值而不是我在这里看到的值。
我已经尝试了以下的东西以及它们的各种组合,这些组合无助于跳过我输入中的一些不良记录:
1)
CAST(mycol AS string) RLIKE "^[0-9]+$"
2)
mycol < 2147483647
3)
CREATE TEMPORARY MACRO isNumber(s string) CAST(s as BIGINT) IS NOT NULL;
4)
isNumber(mycol) != false
5)
SET mapreduce.map.skip.maxrecords = 100000000;
以上方法均无效。 Hive失败并出现以下错误:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NumberFormatException: For input string: "152³305746"
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:416)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:126)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
... 9 more
Caused by: java.lang.NumberFormatException: For input string: "152³305746"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.openx.data.jsonserde.objectinspector.primitive.ParsePrimitiveUtils.parseLong(ParsePrimitiveUtils.java:49)
at org.openx.data.jsonserde.objectinspector.primitive.JavaStringLongObjectInspector.get(JavaStringLongObjectInspector.java:46)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:400)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:279)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:239)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:201)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:565)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:395)
... 17 more