应用错误收集

I am using AWS emr-5.0.0 to run a small cluster that consists of the following notes:

All of them are x3.xlarge machines.

I run a python spark application with two stages.

The problem is that when I manually terminate one of the TASK instances (or it gets terminated due to spot price change) the entire spark job fails.

I would expect that SPARK would just continue running the lost tasks on remaining nodes. Please explain why it does not happen.

Below is the log, master ip is 172-31-1-0, core instance it is 172-31-1-173, the lost not ip is 172-31-3-81).