Question

我正在使用Spark读取DB2数据库并将其加载到另一个Target的过程-我遇到的情况是，如果任何任务由于任何错误而失败，spark会自动重新尝试并重新运行任务；这种行为导致在写入目标时出现一些数据差异。

我们可以关闭这种行为吗？

Answer 1

您可以将spark.task.maxFailures设置为1，以避免重试任务（默认值为4）。
来自https://spark.apache.org/docs/latest/configuration.html：

Number of failures of any particular task before giving up on the job. The total number 
of failures spread across different tasks will not cause the job to fail; a particular 
task has to fail this number of attempts. Should be greater than or equal to 1. Number 
of allowed retries = this value - 1.

Spark WRITE JDBC-任务重新尝试关闭

1 个答案: