我可以在url http://ec2-54-186-47-36.us-west-2.compute.amazonaws.com:8080/看到我有两个工作节点和一个主节点,它显示了火花簇。通过在我的2个工作节点和1个主服务器上运行命令jps,我可以看到所有服务都已启动。 以下脚本我用来初始化SPARKR会话。
if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
Sys.setenv(SPARK_HOME = "/home/ubuntu/spark")
}
但每当我尝试使用Rstudio来初始化会话时它会失败并显示以下错误,请指教我,我不能使用群集的真正好处。
sparkR.session(master = "yarn", deployMode="cluster", sparkConfig =
list(spark.driver.memory = "2g"),sparkPackages = "com.databricks:spark-
csv_2.11:1.1.0")
Launching java with spark-submit command /home/ubuntu/spark/bin/spark-
submit --packages com.databricks:spark-csv_2.11:1.1.0 --driver-memory
"2g" "--packages" "com.databricks:spark-csv_2.11:1.1.0" "sparkr-shell"
/tmp/RtmpkSWHWX/backend_port29310cbc7c6
Ivy Default Cache set to: /home/rstudio/.ivy2/cache
The jars for the packages stored in: /home/rstudio/.ivy2/jars
:: loading settings :: url = jar:file:/home/ubuntu/spark/jars/ivy-
2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.11;1.1.0 in central
found org.apache.commons#commons-csv;1.1 in central
found com.univocity#univocity-parsers;1.5.1 in central
:: resolution report :: resolve 441ms :: artifacts dl 24ms
:: modules in use:
com.databricks#spark-csv_2.11;1.1.0 from central in [default]
com.univocity#univocity-parsers;1.5.1 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/18ms)
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/09/24 23:15:34 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
17/09/24 23:15:42 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It
might have been killed or unable to launch application master.
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend
.waitForApplication (YarnClientSchedulerBackend.scala:85)
at
org.apache.spark.scheduler.cluster.
YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.
start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.api.java.JavaSparkContext.<init>
(JavaSparkContext.scala:58)
at org.apache.spark.api.r.RRDD$.createSparkContext(RRDD.scala:129)
at org.apache.spark.api.r.RRDD.createSparkContext(RRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:748)
17/09/24 23:15:42 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
17/09/24 23:15:42 WARN MetricsSystem: Stopping a MetricsSystem that is not running
17/09/24 23:15:42 ERROR RBackendHandler: createSparkContext on org.apache.spark.api.r.RRDD failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at org.apache.spark.api.r.RRDD$.createSparkContext(RRDD.scala:129)
at org.apache.spark.api.r.RRDD.createSparkContext(RRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Metho
答案 0 :(得分:0)
交互式 Spark shell&amp;来自RStudio(对于R)或来自Jupyter笔记本的会话无法以cluster
模式运行 - 您应该更改为deployMode=client
。
尝试使用--deploy-mode cluster
运行SparkR shell时会出现这种情况(与RStudio的情况基本相同):
$ ./sparkR --master yarn --deploy-mode cluster
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
[...]
Error: Cluster deploy mode is not applicable to Spark shells.
另请参阅this answer了解PySpark案例。
不意味着您不会在此类会话中使用Spark的分布式优势(即群集计算);来自docs:
有两种部署模式可用于启动Spark YARN上的应用程序。在
cluster
模式下,Spark驱动程序在一个内部运行 由YARN在集群上管理的应用程序主进程, 并且客户端可以在启动应用程序后离开。在client
模式,驱动程序在客户端进程和应用程序中运行 master仅用于从YARN请求资源。