EventLogs引用的警告/错误未出现在Driver / Worker日志中

时间:2016-12-12 17:47:14

标签: logging apache-spark

我们正在经历因rpc问题而被杀的任务。

这是错误 - 从JSON记录中提取:

Executor heartbeat timed out after 122312 ms"

此外,还有对“驱动程序日志”的引用以获取更多信息:

"Removed Reason": "Remote RPC client disassociated. Likely due to 
 containers exceeding thresholds, or network issues. 
 Check driver logs for WARN messages

特别注意最后一行: 检查WARN消息的驱动程序日志

以下是实际的JSON记录:

{
    "Event": "SparkListenerTaskEnd",
    "Stage ID": 0,
    "Stage Attempt ID": 0,
    "Task Type": "ShuffleMapTask",
    "Task End Reason": {
        "Reason": "ExecutorLostFailure",
        "Executor ID": "0",
        "Exit Caused By App": true,
        "Loss Reason": "Executor heartbeat timed out after 122312 ms"
    },
    "Task Info": {
        "Task ID": 1,
        "Index": 1,
        "Attempt": 0,
        "Launch Time": 1481563127396,
        "Executor ID": "0",
        "Host": "192.168.0.11",
        "Locality": "PROCESS_LOCAL",
        "Speculative": false,
        "Getting Result Time": 0,
        "Finish Time": 1481563369233,
        "Failed": true,
        "Accumulables": []
    }
} {
    "Event": "SparkListenerBlockManagerRemoved",
    "Block Manager ID": {
        "Executor ID": "0",
        "Host": "192.168.0.11",
        "Port": 39215
    },
    "Timestamp": 1481563369238
} {
    "Event": "SparkListenerExecutorRemoved",
    "Timestamp": 1481563370607,
    "Executor ID": "0",
    "Removed Reason": "Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages."
}

在驱动程序或工作日志 中未发现警告或错误。

可能存在相关项: stdout和stderr始终为空 。我确实在驱动程序控制台上看到了消息 - 但日志中没有任何消息。

任何指示赞赏。

0 个答案:

没有答案