这是我的工作流程,如果我有像show table或drop partition这样的示例SQL,那么效果很好(也可以尝试使用Tez,再次出现含有错误信息的fials)..
workflow-app xmlns="uri:oozie:workflow:0.4" name="UDEX-OOZIE POC">
<credentials>
<credential name="HiveCred" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>thrift://xxxx.local:9083</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>hive/_HOST@xxxx.LOCAL</value>
</property>
</credential>
</credentials>
<start to="IdEduTranCell-pm"/>
<action name="IdEduTranCell-pm" cred="HiveCred">
<hive xmlns="uri:oozie:hive-action:0.5">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${HiveConfigFile}</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<script>${IdEduTranCell_path}</script>
<param>SOURCE_DB_NAME=${SOURCE_DB_NAME}</param>
<param>STRG_DB_NAME=${STRG_DB_NAME}</param>
<param>TABLE_NAME=${TABLE_NAME}</param>
<file>${IdEduTranCell_path}#${IdEduTranCell_path}</file>
<file>${HiveConfigFile}#${HiveConfigFile}</file>
</hive>
<ok to="sub-workflow-end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Sub-workflow failed while loading data into hive tables, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="sub-workflow-end"/>
</workflow-app>
但是SQL失败了。数据不是太大(即使是1条记录都失败了),所以我不能在日志上发现错误..请帮忙
INSERT OVERWRITE TABLE xxx1.fact_tranCell
PARTITION (Timestamp)
select
`(Timestamp)?+.+`, Timestamp as Timestamp
from xxx2.fact_tranCell
order by tranCell_ID,ADMIN_CELL_ID, SITE_ID;
SQL也不错,在命令行上运行正常..
Status: Running (Executing on YARN cluster with App id application_1459756606292_15271)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 7 7 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 3 ...... SUCCEEDED 10 10 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 19.72 s
--------------------------------------------------------------------------------
Loading data to table xxx1.fact_trancell partition (timestamp=null)
Time taken for load dynamic partitions : 496
Loading partition {timestamp=1464012900}
Time taken for adding to write entity : 8
Partition 4g_oss.fact_trancell{timestamp=1464012900} stats: [numFiles=10, numRows=4352, totalSize=9660382, rawDataSize=207776027]
OK
Time taken: 34.595 seconds
--------------------------- LOG ------------------- ---------
Starting Job = job_1459756606292_15285, Tracking URL = hxxp://xxxx.local:8088/proxy/application_1459756606292_15285/
Kill Command = /usr/bin/hadoop job -kill job_1459756606292_15285
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-05-27 17:32:35,792 Stage-1 map = 0%, reduce = 0%
2016-05-27 17:32:51,692 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.97 sec
2016-05-27 17:33:02,263 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.97 sec
MapReduce Total cumulative CPU time: 14 seconds 970 msec
Ended Job = job_1459756606292_15285
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1459756606292_15286, Tracking URL = hxxp://xxxx.local:8088/proxy/application_1459756606292_15286/
Kill Command = /usr/bin/hadoop job -kill job_1459756606292_15286
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2016-05-27 17:33:16,583 Stage-2 map = 0%, reduce = 0%
2016-05-27 17:33:29,814 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.29 sec
2016-05-27 17:33:45,587 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 38.74 sec
2016-05-27 17:33:53,990 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.29 sec
2016-05-27 17:34:08,662 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 39.27 sec
2016-05-27 17:34:17,061 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.29 sec
2016-05-27 17:34:28,576 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 38.28 sec
2016-05-27 17:34:36,940 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.29 sec
2016-05-27 17:34:48,435 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 38.09 sec
MapReduce Total cumulative CPU time: 38 seconds 90 msec
Ended Job = job_1459756606292_15286 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://xxxx.local:8088/proxy/application_1459756606292_15286/
Examining task ID: task_1459756606292_15286_m_000000 (and more) from job job_1459756606292_15286
Task with the most failures(4):
-----
Task ID:
task_1459756606292_15286_r_000000
URL:
hxxp://xxxx.local:8088/taskdetails.jsp?jobid=job_1459756606292_15286&tipid=task_1459756606292_15286_r_000000
-----
Diagnostic Messages for this Task:
Error: Java heap space
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 14.97 sec HDFS Read: 87739161 HDFS Write: 16056577 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 38.09 sec HDFS Read: 16056995 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 53 seconds 60 msec
Intercepting System.exit(2)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [2]
我在编写ORC文件时可以看到日志中的错误,奇怪的是ORC文件很好地编写命令行!!!!
2016-05-30 11:11:20,377 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 1
2016-05-30 11:11:21,307 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 10
2016-05-30 11:11:21,917 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 100
2016-05-30 11:11:22,420 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 1000
2016-05-30 11:11:24,181 INFO [main] org.apache.hadoop.hive.ql.exec.ExtractOperator: 0 finished. closing...
2016-05-30 11:11:24,181 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 1 finished. closing...
2016-05-30 11:11:24,181 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 4352
2016-05-30 11:11:33,028 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewInputBuffer(OutStream.java:107)
at org.apache.hadoop.hive.ql.io.orc.OutStream.write(OutStream.java:128)
at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.writeDirectValues(RunLengthIntegerWriterV2.java:374)
at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.writeValues(RunLengthIntegerWriterV2.java:182)
at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:762)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.flushDictionary(WriterImpl.java:1211)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1132)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1616)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:186)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)