我正在使用Spark读取DB2数据库并将其加载到另一个Target的过程-我遇到的情况是,如果任何任务由于任何错误而失败,spark会自动重新尝试并重新运行任务;这种行为导致在写入目标时出现一些数据差异。
我们可以关闭这种行为吗?
答案 0 :(得分:0)
您可以将spark.task.maxFailures
设置为1,以避免重试任务(默认值为4)。
来自https://spark.apache.org/docs/latest/configuration.html:
Number of failures of any particular task before giving up on the job. The total number
of failures spread across different tasks will not cause the job to fail; a particular
task has to fail this number of attempts. Should be greater than or equal to 1. Number
of allowed retries = this value - 1.