Question

我在saveAsTable时遇到错误。这是我的代码

val df = spark.read.jdbc(url,table,"id",0,100000000,4,properties)
df.write.saveAsTable("custom_order_1kw")

“custoom_order_1kw”是Mysql中的一个表，它是700 + MB。

错误日志：

WARN spark.HeartbeatReceiver: Removing executor 10 with no recent heartbeats: 166323 ms exceeds timeout 120000 ms
17/04/12 15:55:15 ERROR scheduler.TaskSchedulerImpl: Lost executor 10 on 172.21.102.93: Executor heartbeat timed out after 166323 ms
17/04/12 15:55:15 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.21.102.93): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 166323 ms
17/04/12 15:55:25 ERROR scheduler.TaskSchedulerImpl: Lost executor 10 on 172.21.102.93: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

发生4次相同错误后，任务中止。

我使用spark-shell来测试代码。

spark-shell --master spark://172.21.102.93:7077 --executor-memory 4g --driver-cores 1 --executor-cores 1 --driver-memory 8g

如果我选择一个较小的表来提取（200 + MB），一切都正确！

有什么错误的想法吗？

Answer 1

火花defaults.conf 将spark.network.timeout设置为更高的值默认值为120

您也可以在命令中添加相同的

spark-submit --conf spark.network.timeout 10000000 --class myclass.neuralnet.TrainNetSpark --master spark://master.cluster:7077 --driver-memory 30G --executor-memory 14G --num-executors 7 --executor-cores 8 --conf spark.driver.maxResultSize=4g --conf spark.executor.heartbeatInterval=10000000 path/to/my.jar

Spark read.jdbc saveAsTable

1 个答案: