我一直在遇到火花上容器大小调整的问题。我没有看到任何明显的问题,我正在努力完成。以下是我的设置:
09/05/2017 4:33:43 PM
火花设置:
5 machines,
one master
4 nodes.
Each node has 61 G and 8 cores total.
我的工作一直在失败,我不知道为什么。当它开始显示失败的迹象时,这是正确的日志:
spark.driver.memory 5g
spark.history.ui.port=18081
spark.executor.memory=39936mb
spark.executor.cores=7
spark.executor.instances=4
spark.master=yarn-client
spark.yarn.executor.memoryOverhead=7168
...。 后来......
17/05/16 19:57:51 INFO python.PythonRunner: Times: total = 3571812, boot = 17, init = 481752, finish = 3090043
17/05/16 19:57:52 INFO storage.ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
17/05/16 19:57:52 INFO storage.ShuffleBlockFetcherIterator: Started 5 remote fetches in 100 ms
17/05/16 19:57:59 INFO python.PythonRunner: Times: total = 6373, boot = -60752, init = 60758, finish = 6367
17/05/16 19:59:38 INFO executor.Executor: Executor is trying to kill task 5.0 in stage 10.0 (TID 978)
17/05/16 19:59:38 INFO executor.Executor: Executor is trying to kill task 21.0 in stage 10.0 (TID 994)
17/05/16 19:59:38 WARN python.PythonRunner: Incomplete task interrupted: Attempting to kill Python Worker
17/05/16 19:59:38 WARN python.PythonRunner: Incomplete task interrupted: Attempting to kill Python Worker
17/05/16 19:59:38 WARN python.PythonRunner: Incomplete task interrupted: Attempting to kill Python Worker
*17/05/16 19:59:38 ERROR datasources.DefaultWriterContainer: Aborting task.*
......然后在最后:
17/05/16 19:59:38 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 40,062,362
17/05/16 19:59:38 ERROR datasources.DefaultWriterContainer: Aborting task.
似乎它不是YARN例外 - 它来自火花。而且,似乎无法将结果写入文件。所以,我想知道问题可能是什么。
如果这是驱动程序的问题,那么我想知道我是否应该增加17/05/16 19:59:40 WARN python.PythonRunner: Incomplete task interrupted: Attempting to kill Python Worker
17/05/16 19:59:40 WARN python.PythonRunner: Incomplete task interrupted: Attempting to kill Python Worker
17/05/16 19:59:40 WARN python.PythonRunner: Incomplete task interrupted: Attempting to kill Python Worker
17/05/16 19:59:40 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
,但是再一次,它已经是5克,这是非常高的。
我也不认为我需要增加spark.driver.memory
,默认情况下是1g,因为我查看过去输出的文件,它们是0.4克压缩的实木复合地板。
我应该更改哪些设置?