我们正在经历因rpc问题而被杀的任务。
这是错误 - 从JSON记录中提取:
Executor heartbeat timed out after 122312 ms"
此外,还有对“驱动程序日志”的引用以获取更多信息:
"Removed Reason": "Remote RPC client disassociated. Likely due to
containers exceeding thresholds, or network issues.
Check driver logs for WARN messages
特别注意最后一行: 检查WARN消息的驱动程序日志
以下是实际的JSON记录:
{
"Event": "SparkListenerTaskEnd",
"Stage ID": 0,
"Stage Attempt ID": 0,
"Task Type": "ShuffleMapTask",
"Task End Reason": {
"Reason": "ExecutorLostFailure",
"Executor ID": "0",
"Exit Caused By App": true,
"Loss Reason": "Executor heartbeat timed out after 122312 ms"
},
"Task Info": {
"Task ID": 1,
"Index": 1,
"Attempt": 0,
"Launch Time": 1481563127396,
"Executor ID": "0",
"Host": "192.168.0.11",
"Locality": "PROCESS_LOCAL",
"Speculative": false,
"Getting Result Time": 0,
"Finish Time": 1481563369233,
"Failed": true,
"Accumulables": []
}
} {
"Event": "SparkListenerBlockManagerRemoved",
"Block Manager ID": {
"Executor ID": "0",
"Host": "192.168.0.11",
"Port": 39215
},
"Timestamp": 1481563369238
} {
"Event": "SparkListenerExecutorRemoved",
"Timestamp": 1481563370607,
"Executor ID": "0",
"Removed Reason": "Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages."
}
但 在驱动程序或工作日志 中未发现警告或错误。
可能存在相关项: stdout和stderr始终为空 。我确实在驱动程序控制台上看到了消息 - 但日志中没有任何消息。
任何指示赞赏。