当我使用Tez引擎在Hive中运行查询时,出现间歇性 FileNotFoundException错误。
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1508808910527_45616_1_00, diagnostics=[Task failed, taskId=task_1508808910527_45616_1_00_000066, diagnostics=[TaskAttempt 0 failed, info=[Container container_e09_1508808910527_45616_01_000033 finished with diagnostics set to [Container failed, exitCode=-1000. File does not exist: hdfs://server02.corp.company.com:8020/tmp/hive/username/_tez_session_dir/b65ddde9-110e-47fc-ae1c-33a1f754f839/nzcodec.jar
java.io.FileNotFoundException: File does not exist: hdfs://server02.corp.company.com:8020/tmp/hive/username/_tez_session_dir/b65ddde9-110e-47fc-ae1c-33a1f754f839/nzcodec.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
查询从登台表中选择数据,对其进行重新分区并将其写入报告表。
INSERT OVERWRITE TABLE ${reporting_table} PARTITION (day, app_name) select <all the fields> from ${staging_table} where day = '${day}'
分阶段数据存储在Avro文件中为350GB
hadoop fs -du -h -s /staged-data/2017-11-02
350.7 G /staged-data/2017-11-02
我多次对同一组数据运行相同的查询,故障是间歇性的。
我的纱线设置如下:
yarn.nodemanager.resource.memory-mb 83968
yarn.scheduler.minimum-allocation-mb 2048
查询中的我的Tez设置如下所示:
SET hive.execution.engine=tez;
SET tez.am.resource.memory.mb=2048;
SET hive.tez.container.size=2048;
SET hive.merge.tezfiles=true;
SET hive.merge.smallfiles.avgsize=128000000;
SET hive.merge.size.per.task=128000000;
我已经完成了对https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html的建议,但我仍然看到了这个问题。调整容器大小似乎没有帮助。
我可以修改另一组设置以防止这种情况吗?