我正在尝试执行HSQL(exec engine:TEZ),如下所示:
insert into table bi.tenant_day_20160422
select
tmp.user_id,class.tenant_id,tmp.study_date
from
bi.tmp_date tmp
join
gxb.gxb_class class
on tmp.class_id=class.class_id
group by tmp.user_id,class.tenant_id,tmp.study_date;
但它在减少期间总是失败,执行计划显示如下:
Stage-3
Stats-Aggr Operator
Stage-0
Move Operator
table:{"name:":"bi.tenant_day_20160422"}
Stage-2
Dependency Collection{}
Stage-1
Reducer 3
File Output Operator [FS_14]
table:{"name:":"bi.tenant_day_20160422"}
Select Operator [SEL_13] (rows=756048 width=72)
Output:["_col0","_col1","_col2"]
Group By Operator [GBY_12] (rows=756048 width=72)
Output:["_col0","_col1","_col2"],keys:KEY._col0, KEY._col1, KEY._col2
<-Reducer 2 [SIMPLE_EDGE]
SHUFFLE [RS_11]
PartitionCols:_col0, _col1, _col2
Group By Operator [GBY_10] (rows=1512097 width=72)
Output:["_col0","_col1","_col2"],keys:_col0, _col2, _col4
Merge Join Operator [MERGEJOIN_19] (rows=1512097 width=72)
Conds:RS_6._col1=RS_7._col0(Inner),Output:["_col0","_col2","_col4"]
<-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_6]
PartitionCols:_col1
Select Operator [SEL_2] (rows=1374634 width=72)
Output:["_col0","_col1","_col2"]
Filter Operator [FIL_17] (rows=1374634 width=72)
predicate:class_id is not null
TableScan [TS_0] (rows=1374634 width=72)
bi@tmp_date,tmp,Tbl:COMPLETE,Col:NONE,Output:["user_id","class_id","study_date"]
<-Map 4 [SIMPLE_EDGE]
SHUFFLE [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=19671 width=16)
Output:["_col0","_col1"]
Filter Operator [FIL_18] (rows=19671 width=16)
predicate:class_id is not null
TableScan [TS_3] (rows=19671 width=16)
gxb@gxb_class,class,Tbl:COMPLETE,Col:NONE,Output:["class_id","tenant_id"]
错误日志:
17:47:49.164 [TezChild] ERROR ExecReducer - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 5: file:
at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:250)
at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:78)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:653)
at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:759)
at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:265)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:196)
at org.apache.tez.mapreduce.processor.reduce.ReduceProcessor.runOldReducer(ReduceProcessor.java:190)
at org.apache.tez.mapreduce.processor.reduce.ReduceProcessor.run(ReduceProcessor.java:152)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 5: file:
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.fs.Path.<init>(Path.java:93)
at org.apache.hadoop.fs.Globber.glob(Globber.java:211)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:235)
... 20 more
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 5: file:
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.failExpecting(URI.java:2835)
at java.net.URI$Parser.parse(URI.java:3038)
at java.net.URI.<init>(URI.java:753)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 29 more
但是如果我尝试像hsql那样:
insert into table bi.tenant_day_20160422
select tmp.user_id,class.tenant_id,tmp.study_date
from (select * from bi.tmp_date limit 200000) tmp
join gxb.gxb_class class
on tmp.class_id=class.class_id group by tmp.user_id,class.tenant_id,tmp.study_date;
它工作!
这让我非常困惑,那么问题是什么?有谁能够帮我?