Dataproc Spark Job崩溃后无法找到JVM致命错误日志文件(hs_err_pid.log)

时间:2019-06-04 11:03:27

标签: apache-spark google-cloud-platform google-cloud-dataproc

Apache Spark Executor JVM在C ++库中崩溃后,我无法找到在Executor JVM输出日志中指定的hs_err_pid.log文件。这是Executor输出日志的示例:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f6326dce8b0, pid=28580, tid=0x00007f630ea57700
#
# JRE version: OpenJDK Runtime Environment (8.0_212-b01) (build 1.8.0_212-8u212-b01-1~deb9u1-b01)
# Java VM: OpenJDK 64-Bit Server VM (25.212-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libessence-jni.so+0x18b0]  Java_com_evernote_service_nts_indexer_lib_Essence_EssProcess+0x0
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1559573462307_0002/container_1559573462307_0002_01_000005/hs_err_pid28580.log
10:50:00:[32m562[m [Executor task launch worker for task 41] [32mINFO[m  .....NtsLibInternalIndexerProcessor(NtsLibInternalIndexerProcessor.java:50) [32mprocess             [m     Process for user: 18432
[thread 140063422109440 also had an error]
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

但是,当我使用SSH定位目标工作机以定位hs_err_pid28580.log时,找不到该文件的任何痕迹。我尝试过:

vglazkov@reindex-cluster-vg-w-0:~$ sudo find / -name hs_err_pid28580.log
vglazkov@reindex-cluster-vg-w-0:~$ 
vglazkov@reindex-cluster-vg-w-0:~$ sudo ls -la /hadoop/yarn/nm-local-dir/usercache/root/appcache/
total 12
drwx--x--- 3 yarn yarn 4096 Jun  4 10:46 .
drwxr-x--- 4 yarn yarn 4096 May 15 15:47 ..
drwx--x--- 3 yarn yarn 4096 Jun  4 10:48 application_1557935076075_0097

但是在最后一种情况下,名为application_1557935076075_0097的目录与我的applicationId application_1559573462307_0002不匹配,并且不包含任何hs_err_pid.log文件

0 个答案:

没有答案