使用yarn-client模式运行pi.py示例时,Spark失败

时间:2015-01-06 05:53:50

标签: apache-spark

我可以成功运行java版本的pi示例,如下所示。

./bin/spark-submit --class org.apache.spark.examples.SparkPi \ 
    --master yarn-client \ 
    --num-executors 3 \ 
    --driver-memory 4g \ 
    --executor-memory 2g \ 
    --executor-cores 1 \ 
    --queue thequeue \ 
    lib/spark-examples*.jar \ 
    10 

但是,python版本失败,并显示以下错误信息。我使用了纱线客户端模式。具有yarn-client模式的pyspark命令行返回相同的信息。任何人都可以帮我解决这个问题吗?

nlp@yyy2:~/spark$ ./bin/spark-submit --master yarn-client examples/src/main/python/pi.py 
15/01/05 17:22:26 INFO spark.SecurityManager: Changing view acls to: nlp 
15/01/05 17:22:26 INFO spark.SecurityManager: Changing modify acls to: nlp 
15/01/05 17:22:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp) 
15/01/05 17:22:26 INFO slf4j.Slf4jLogger: Slf4jLogger started 
15/01/05 17:22:26 INFO Remoting: Starting remoting 
15/01/05 17:22:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@yyy2:42747] 
15/01/05 17:22:26 INFO util.Utils: Successfully started service 'sparkDriver' on port 42747. 
15/01/05 17:22:26 INFO spark.SparkEnv: Registering MapOutputTracker 
15/01/05 17:22:26 INFO spark.SparkEnv: Registering BlockManagerMaster 
15/01/05 17:22:26 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150105172226-aeae 
15/01/05 17:22:26 INFO storage.MemoryStore: MemoryStore started with capacity 265.1 MB 
15/01/05 17:22:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/01/05 17:22:27 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-cbe0079b-79c5-426b-b67e-548805423b11 
15/01/05 17:22:27 INFO spark.HttpServer: Starting HTTP Server 
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT 
15/01/05 17:22:27 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:57169 
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'HTTP file server' on port 57169. 
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT 
15/01/05 17:22:27 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 
15/01/05 17:22:27 INFO ui.SparkUI: Started SparkUI at http://yyy2:4040
15/01/05 17:22:27 INFO client.RMProxy: Connecting to ResourceManager at yyy14/10.112.168.195:8032 
15/01/05 17:22:27 INFO yarn.Client: Requesting a new application from cluster with 6 NodeManagers 
15/01/05 17:22:27 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 
15/01/05 17:22:27 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 
15/01/05 17:22:27 INFO yarn.Client: Setting up container launch context for our AM 
15/01/05 17:22:27 INFO yarn.Client: Preparing resources for our AM container 
15/01/05 17:22:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 24 for xxx on ha-hdfs:hzdm-cluster1 
15/01/05 17:22:28 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/lib/spark-assembly-1.2.0-hadoop2.5.2.jar -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/spark-assembly-1.2.0-hadoop2.5.2.jar 
15/01/05 17:22:29 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/pi.py 
15/01/05 17:22:29 INFO yarn.Client: Setting up the launch environment for our AM container 
15/01/05 17:22:29 INFO spark.SecurityManager: Changing view acls to: nlp 
15/01/05 17:22:29 INFO spark.SecurityManager: Changing modify acls to: nlp 
15/01/05 17:22:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp) 
15/01/05 17:22:29 INFO yarn.Client: Submitting application 23 to ResourceManager 
15/01/05 17:22:30 INFO impl.YarnClientImpl: Submitted application application_1420444011562_0023 
15/01/05 17:22:31 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:31 INFO yarn.Client: 
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  } 
         diagnostics: N/A 
         ApplicationMaster host: N/A 
         ApplicationMaster RPC port: -1 
         queue: root.default 
         start time: 1420449749969 
         final status: UNDEFINED 
         tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
         user: nlp 
15/01/05 17:22:32 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:33 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:34 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:35 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:36 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@yyy16:52855/user/YarnAM#435880073] 
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> yyy14, PROXY_URI_BASES -> http://yyy14:8070/proxy/application_1420444011562_0023), /proxy/application_1420444011562_0023 
15/01/05 17:22:36 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 
15/01/05 17:22:37 INFO yarn.Client: Application report for application_1420444011562_0023 (state: RUNNING) 
15/01/05 17:22:37 INFO yarn.Client: 
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  } 
         diagnostics: N/A 
         ApplicationMaster host: yyy16 
         ApplicationMaster RPC port: 0 
         queue: root.default 
         start time: 1420449749969 
         final status: UNDEFINED 
         tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
         user: nlp 
15/01/05 17:22:37 INFO cluster.YarnClientSchedulerBackend: Application application_1420444011562_0023 has started running. 
15/01/05 17:22:37 INFO netty.NettyBlockTransferService: Server created on 35648 
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Trying to register BlockManager 
15/01/05 17:22:37 INFO storage.BlockManagerMasterActor: Registering block manager yyy2:35648 with 265.1 MB RAM, BlockManagerId(<driver>, yyy2, 35648) 
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Registered BlockManager 
15/01/05 17:22:37 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@yyy16:52855] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
15/01/05 17:22:38 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED! 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null} 
15/01/05 17:22:38 INFO ui.SparkUI: Stopped Spark web UI at http://yyy2:4040
15/01/05 17:22:38 INFO scheduler.DAGScheduler: Stopping DAGScheduler 
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Stopped 
15/01/05 17:22:39 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 
15/01/05 17:22:39 INFO storage.MemoryStore: MemoryStore cleared 
15/01/05 17:22:39 INFO storage.BlockManager: BlockManager stopped 
15/01/05 17:22:39 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 
15/01/05 17:22:39 INFO spark.SparkContext: Successfully stopped SparkContext 
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 
15/01/05 17:22:57 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 
Traceback (most recent call last): 
  File "/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py", line 29, in <module>
    sc = SparkContext(appName="PythonPi") 
  File "/home/nlp/spark/python/pyspark/context.py", line 105, in __init__ 
    conf, jsc) 
  File "/home/nlp/spark/python/pyspark/context.py", line 153, in _do_init 
    self._jsc = jsc or self._initialize_context(self._conf._jconf) 
  File "/home/nlp/spark/python/pyspark/context.py", line 201, in _initialize_context 
    return self._jvm.JavaSparkContext(jconf) 
  File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__ 
  File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: java.lang.NullPointerException 
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) 
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) 
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
        at java.lang.reflect.Constructor.newInstance(Constructor.java:408) 
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
        at py4j.Gateway.invoke(Gateway.java:214) 
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
        at py4j.GatewayConnection.run(GatewayConnection.java:207) 
        at java.lang.Thread.run(Thread.java:745)

5 个答案:

答案 0 :(得分:6)

如果您在Java 8上运行此示例,这可能是由于Java 8的内存分配策略过多:https://issues.apache.org/jira/browse/YARN-4714

您可以通过在yarn-site.xml

中设置以下属性来强制YARN忽略此项
<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

答案 1 :(得分:2)

尝试使用deploy mode参数,如下所示:

--deploy-mode cluster

我有像你这样的问题,这个参数很有效。

答案 2 :(得分:1)

我使用spark-submit和yarn-client遇到了类似的问题(我得到了相同的NPE / stacktrace)。调低我的记忆设置就可以了。当你试图分配太多内存时,似乎会失败。我首先要删除--executor-memory--driver-memory开关。

答案 3 :(得分:1)

我减少了高级spark-env中的核心数量,使其正常工作。

答案 4 :(得分:0)

我遇到了这个问题(hdp 2.3 spark 1.3.1)

nrows

我的解决方案是设置spark配置值:

def getLastRow(sh, col_num = None):
  """
  ;param sh: Excel worksheet
  ;param col_num: long/integer identifying which column to query
  """
  if not col_num == None:
    return sh.nrows

  for r in range(sh.nrows,1,-1):
    if str(sh.col_values(col_num)[r-1]).strip() == '':
      print 'cell({},{}) contains an empty value or trims to empty string'.format(r, col_num)
    else:
      return r