Spark read.jdbc saveAsTable

时间:2017-04-12 08:53:40

标签: apache-spark apache-spark-sql

我在saveAsTable时遇到错误。 这是我的代码

val df = spark.read.jdbc(url,table,"id",0,100000000,4,properties)
df.write.saveAsTable("custom_order_1kw")

“custoom_order_1kw”是Mysql中的一个表,它是700 + MB。

错误日志:

WARN spark.HeartbeatReceiver: Removing executor 10 with no recent heartbeats: 166323 ms exceeds timeout 120000 ms
17/04/12 15:55:15 ERROR scheduler.TaskSchedulerImpl: Lost executor 10 on 172.21.102.93: Executor heartbeat timed out after 166323 ms
17/04/12 15:55:15 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.21.102.93): ExecutorLostFailure (executor 10 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 166323 ms
17/04/12 15:55:25 ERROR scheduler.TaskSchedulerImpl: Lost executor 10 on 172.21.102.93: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

发生4次相同错误后,任务中止。

我使用spark-shell来测试代码。

spark-shell --master spark://172.21.102.93:7077 --executor-memory 4g --driver-cores 1 --executor-cores 1 --driver-memory 8g

如果我选择一个较小的表来提取(200 + MB),一切都正确!

有什么错误的想法吗?

1 个答案:

答案 0 :(得分:0)

  

火花defaults.conf   将spark.network.timeout设置为更高的值默认值为120

您也可以在命令中添加相同的

spark-submit --conf spark.network.timeout 10000000 --class myclass.neuralnet.TrainNetSpark --master spark://master.cluster:7077 --driver-memory 30G --executor-memory 14G --num-executors 7 --executor-cores 8 --conf spark.driver.maxResultSize=4g --conf spark.executor.heartbeatInterval=10000000 path/to/my.jar