YARN集群上的sparkR

时间:2017-09-24 23:40:49

标签: apache-spark rstudio yarn sparkr

我可以在url http://ec2-54-186-47-36.us-west-2.compute.amazonaws.com:8080/看到我有两个工作节点和一个主节点,它显示了火花簇。通过在我的2个工作节点和1个主服务器上运行命令jps,我可以看到所有服务都已启动。 以下脚本我用来初始化SPARKR会话。

    if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
    Sys.setenv(SPARK_HOME = "/home/ubuntu/spark")
    }

但每当我尝试使用Rstudio来初始化会话时它会失败并显示以下错误,请指教我,我不能使用群集的真正好处。

   sparkR.session(master = "yarn", deployMode="cluster", sparkConfig = 
   list(spark.driver.memory = "2g"),sparkPackages = "com.databricks:spark-
   csv_2.11:1.1.0")

  Launching java with spark-submit command /home/ubuntu/spark/bin/spark-
  submit  --packages com.databricks:spark-csv_2.11:1.1.0 --driver-memory 
  "2g" "--packages" "com.databricks:spark-csv_2.11:1.1.0" "sparkr-shell" 
   /tmp/RtmpkSWHWX/backend_port29310cbc7c6 
  Ivy Default Cache set to: /home/rstudio/.ivy2/cache
 The jars for the packages stored in: /home/rstudio/.ivy2/jars
 :: loading settings :: url = jar:file:/home/ubuntu/spark/jars/ivy-
 2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.11;1.1.0 in central
found org.apache.commons#commons-csv;1.1 in central
found com.univocity#univocity-parsers;1.5.1 in central
:: resolution report :: resolve 441ms :: artifacts dl 24ms
:: modules in use:
com.databricks#spark-csv_2.11;1.1.0 from central in [default]
com.univocity#univocity-parsers;1.5.1 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
---------------------------------------------------------------------
|                  |            modules            ||   artifacts   |
|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
|      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/18ms)


Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel).
   17/09/24 23:15:34 WARN NativeCodeLoader: Unable to load native-hadoop library 
    for your platform... using builtin-java classes where applicable
    17/09/24 23:15:42 ERROR SparkContext: Error initializing SparkContext.
    org.apache.spark.SparkException: Yarn application has already ended! It 
     might have been killed or unable to launch application master.
    at 



 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend 
.waitForApplication (YarnClientSchedulerBackend.scala:85)
at 

 org.apache.spark.scheduler.cluster.
 YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.
 start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.api.java.JavaSparkContext.<init>
(JavaSparkContext.scala:58)
at org.apache.spark.api.r.RRDD$.createSparkContext(RRDD.scala:129)
at org.apache.spark.api.r.RRDD.createSparkContext(RRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 



sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
    at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
    at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
    at java.lang.Thread.run(Thread.java:748)
17/09/24 23:15:42 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
17/09/24 23:15:42 WARN MetricsSystem: Stopping a MetricsSystem that is not running
17/09/24 23:15:42 ERROR RBackendHandler: createSparkContext on org.apache.spark.api.r.RRDD failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at org.apache.spark.api.r.RRDD$.createSparkContext(RRDD.scala:129)
    at org.apache.spark.api.r.RRDD.createSparkContext(RRDD.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Metho

1 个答案:

答案 0 :(得分:0)

交互式 Spark shell&amp;来自RStudio(对于R)或来自Jupyter笔记本的会话无法以cluster模式运行 - 您应该更改为deployMode=client

尝试使用--deploy-mode cluster运行SparkR shell时会出现这种情况(与RStudio的情况基本相同):

$ ./sparkR --master yarn --deploy-mode cluster
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
[...]
Error: Cluster deploy mode is not applicable to Spark shells.

另请参阅this answer了解PySpark案例。

意味着您不会在此类会话中使用Spark的分布式优势(即群集计算);来自docs

  

有两种部署模式可用于启动Spark   YARN上的应用程序。在cluster模式下,Spark驱动程序在一个内部运行   由YARN在集群上管理的应用程序主进程,   并且客户端可以在启动应用程序后离开。在client   模式,驱动程序在客户端进程和应用程序中运行   master仅用于从YARN请求资源。