当我尝试在Spark on Yarn集群中运行随机森林时(3个数据节点)。我遇到了 OutOfMemoryError异常。
以下是容器登录节点管理器
上的错误堆栈 16/05/30 13:41:17 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
Exception in thread "dispatcher-event-loop-4" java.lang.OutOfMemoryError: PermGen space
at sun.misc.Unsafe.defineClass(Native Method)
at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)
at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:396)
at java.security.AccessController.doPrivileged(Native Method)
at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395)
at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113)
at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331)
at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376)
at java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:493)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1133)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:251)
at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:228)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:509)
at org.apache.spark.rpc.RpcEndpointRef.ask(RpcEndpointRef.scala:62)
at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$org$apache$spark$storage$BlockManagerMasterEndpoint$$removeRdd$2.apply(BlockManagerMasterEndpoint.scala:147)
at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$org$apache$spark$storage$BlockManagerMasterEndpoint$$removeRdd$2.apply(BlockManagerMasterEndpoint.scala:146)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
16/05/30 13:41:21 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:$LD_LIBRARY_PATH" {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.authenticate=false' '-Dspark.shuffle.service.port=7337' '-Dspark.driver.port=34896' '-Dspark.ui.port=0' -Dspark.yarn.app.container.log.dir=<LOG_DIR> -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@172.26.34.93:34896 --executor-id 2 --hostname datalake01 --cores 1 --app-id application_1464237978069_0248 --user-class-path file:$PWD/__app__.jar --user-class-path file:$PWD/com.databricks_spark-csv_2.11-1.4.0.jar --user-class-path file:$PWD/org.apache.commons_commons-csv-1.1.jar --user-class-path file:$PWD/com.univocity_univocity-parsers-1.5.1.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
这个-XX:MaxPermSize=256m
是我要调整的参数。但是如何在Cloudera Manager中调整此参数?
答案 0 :(得分:3)
您可以将JVM参数添加到spark-submit命令。 重要的是要注意,您有驱动程序和执行程序的配置。
spark/bin/spark-submit ... --conf spark.driver.extraJavaOptions=" -XX:MaxPermSize=256M " --conf spark.executor.extraJavaOptions=" -XX:MaxPermSize=256M " ...