我在7节点集群中运行Spark应用程序 - 在亚马逊EC2机器上运行1个驱动程序和6个执行程序。我使用6个m4.2xlarge实例,每个实例有1个执行器。它们各有8个核心。驱动程序位于m4.xlarge VM上,该VM有4个核心。火花版本是2.1.1。
我使用以下命令启动SparkPageRank
应用程序。
spark-submit \
--name "ABC" \
--master spark://xxx:7077 \
--conf spark.driver.memory=10g \
--conf "spark.app.name=ABC" \
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:ConcGCThreads=5" \
--class org.apache.spark.examples.SparkPageRank \
--executor-memory 22g \
/home/ubuntu/spark-2.1.1/examples/target/scala-2.11/jars/spark-examples_2.11-2.1.1.jar /hdfscheck/pagerank_data_11G_repl1.txt 4
使用这些配置的GC时间非常高。
以下是执行者之一的GC日志的一小部分:
1810.053: [GC pause (GCLocker Initiated GC) (young), 0.1694102 secs]
[Parallel Time: 167.8 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 1810053.2, Avg: 1810053.3, Max: 1810053.4, Diff: 0.1]
[Ext Root Scanning (ms): Min: 0.2, Avg: 0.4, Max: 0.7, Diff: 0.5, Sum: 2.9]
[Update RS (ms): Min: 12.4, Avg: 12.7, Max: 13.2, Diff: 0.7, Sum: 101.4]
[Processed Buffers: Min: 11, Avg: 12.9, Max: 16, Diff: 5, Sum: 103]
[Scan RS (ms): Min: 29.4, Avg: 29.8, Max: 30.1, Diff: 0.7, Sum: 238.7]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 124.5, Avg: 124.6, Max: 124.7, Diff: 0.1, Sum: 996.9]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 2.2, Max: 5, Diff: 4, Sum: 18]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 167.5, Avg: 167.5, Max: 167.6, Diff: 0.1, Sum: 1340.2]
[GC Worker End (ms): Min: 1810220.8, Avg: 1810220.8, Max: 1810220.8, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.4 ms]
[Other: 1.2 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.5 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.4 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 992.0M(960.0M)->0.0B(960.0M) Survivors: 160.0M->160.0M Heap: 14.6G(22.0G)->13.8G(22.0G)]
[Times: user=1.34 sys=0.00, real=0.17 secs]
(更多https://pastebin.com/E5bbQZgD)
我只能看到并发标记结束需要花费大量时间的一件事。
如果有人能告诉我如何针对这种特殊情况调整垃圾收集,我将不胜感激。驱动程序节点所在的VM具有16GB内存,而执行程序VM具有32GB内存。
答案 0 :(得分:0)
(不是一个真正的答案,只是一些帮助的提示)。
我的驱动程序节点有16GB的内存
我认为,如果您spark-submit
spark.driver.memory=10g
执行--driver-memory
,情况并非如此。
您应该使用run-example SparkPageRank
代替(这只是一种捷径,但让事情变得更容易记住):
- 驱动程序内存MEM 驱动程序内存(例如1000M,2G)(默认值:1024M)。
关于你的主要问题,GC的高用量似乎是PageRank算法的工作原理。请注意随机播放与输入的大量使用并不大。
我还认为与任务时间相比,GC时间不长。
我担心RDD阻止只有 2 ,因为这似乎表明并行性非常低,但这可能是它应该如何工作。
以下内容可以替换为简单的spark-submit ... \
--class org.apache.spark.examples.SparkPageRank ... \
/home/ubuntu/spark-2.1.1/examples/target/scala-2.11/jars/spark-examples_2.11-2.1.1.jar
(如Spark' s Where to Go from Here中所述)
void foo(int x) {
int x = 0; // << Shadows parameter x
}