多个Hive连接失败并执行错误,返回代码2

时间:2015-12-01 15:48:38

标签: hadoop mapreduce hive

我正在尝试执行一个查询,其中一个表外部连接在另外两个表上。查询如下:

SELECT T.Rdate, c.Specialty_Cruises, b.Specialty_Cruises from arunf.PASSENGER_HISTORY_FACT T 
LEFT OUTER JOIN arunf.RPT_WEB_COURTESY_HOLD_TEMP C on (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(c.rdate,'yyyy-MM-dd') AND T.book_num = c.Courtesy_Hold_Booking_Num) 
LEFT OUTER JOIN arunf.RPT_WEB_BOOKING_NUM_TEMP b ON (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(b.rdate,'yyyy-MM-dd') AND T.book_num = B.Online_Booking_Number);

此查询失败并显示通知:

: exec.Task (SessionState.java:printError(922)) - /tmp/arunf/hive.log
: mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(308)) - Execution failed with exit status: 2
: ql.Driver (SessionState.java:printError(922)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask 

错误日志包含以下内容:

2015-12-01 10:25:16,077 INFO  [main]: mr.ExecDriver (SessionState.java:printInfo(913)) - Execution log at: /tmp/arunf/arunf_20151201102525_914a2eab-652b-440c-9fdc-a473b4caa026.log
2015-12-01 10:25:16,278 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2015-12-01 10:25:16,278 INFO  [main]: exec.Utilities (Utilities.java:deserializePlan(953)) - Deserializing MapredLocalWork via kryo
2015-12-01 10:25:16,421 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=deserializePlan start=1448983516278 end=1448983516421 duration=143 from=org.apache.hadoop.hive.ql.exec.Utilities>
2015-12-01 10:25:16,429 INFO  [main]: mr.MapredLocalTask (SessionState.java:printInfo(913)) - 2015-12-01 10:25:16   Starting to launch local task to process map join;  maximum memory = 1029701632
2015-12-01 10:25:16,498 INFO  [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for c created
2015-12-01 10:25:16,500 INFO  [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for b created
2015-12-01 10:25:16,500 INFO  [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[2]
2015-12-01 10:25:16,500 INFO  [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 2 TS initialized
2015-12-01 10:25:16,500 INFO  [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 2 TS
2015-12-01 10:25:16,500 INFO  [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK
2015-12-01 10:25:16,500 INFO  [main]: exec.TableScanOperator (Operator.java:initialize(394)) - Initialization Done 2 TS
2015-12-01 10:25:16,500 INFO  [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(461)) - fetchoperator for b initialized
2015-12-01 10:25:16,500 INFO  [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[0]
2015-12-01 10:25:16,501 INFO  [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 0 TS initialized
2015-12-01 10:25:16,501 INFO  [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 0 TS
2015-12-01 10:25:16,502 INFO  [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK
2015-12-01 10:25:16,503 INFO  [main]: exec.HashTableSinkOperator (Operator.java:initialize(346)) - Initializing Self HASHTABLESINK[1]
2015-12-01 10:25:16,503 INFO  [main]: mapjoin.MapJoinMemoryExhaustionHandler (MapJoinMemoryExhaustionHandler.java:<init>(61)) - JVM Max Heap Size: 1029701632
2015-12-01 10:25:16,533 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInProcess(357)) - Hive Runtime Error: Map local work failed
java.lang.RuntimeException: cannot find field courtesy_hold_booking_num from [0:rdate, 1:online_booking_number, 2:pages, 3:mobile_device_type, 4:specialty_cruises]
    at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
    at org.apache.hadoop.hive.serde2.BaseStructObjectInspector.getStructFieldRef(BaseStructObjectInspector.java:133)
    at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
    at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
    at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:138)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
    at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
    at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:460)
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:366)
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:346)
    at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:743)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

请注意,当主表左外连接表时,它们会成功。 例如,以下查询成功:

SELECT T.Rdate from arunf.PASSENGER_HISTORY_FACT T 
LEFT OUTER JOIN arunf.RPT_WEB_COURTESY_HOLD_TEMP C on (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(c.rdate,'yyyy-MM-dd') AND T.book_num = c.Courtesy_Hold_Booking_Num);

SELECT T.Rdate from arunf.PASSENGER_HISTORY_FACT T 
LEFT OUTER JOIN arunf.RPT_WEB_BOOKING_NUM_TEMP b ON (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(b.rdate,'yyyy-MM-dd') AND T.book_num = B.Online_Booking_Number);

我还能够以相同的组合方式与另外两个表进行此主表的左外连接。只有当我尝试使用这两个辅助表继续加入主表时,我才面临这个问题。

请提供您对此问题的见解。

1 个答案:

答案 0 :(得分:1)

Hive bug来来去去。它可能取决于Hive版本(?)和表格式(文本?AVRO?Sequence?ORC?Parquet?)。

现在,如果每个查询都有效,那么为什么不根据分而治之的方法 尝试解决方法(或者:如果Hive不够智能设计执行计划,然后让我们自己设计)例如

branch=`git rev-parse HEAD | git branch -a --contains | grep remotes | sed s/.*remotes.origin.//`
branch=`echo $branch | awk '{print $NF}'`