我在Hadoop 2.7群集上安装了Spark 1.4.1。
我已经启动了SparkR shell而没有错误:
bin/sparkR --master yarn-client
我没有错误地运行R命令(来自spark.apache.org的介绍性示例):
df <- createDataFrame(sqlContext, faithful)
当我运行命令时:
head(select(df, df$eruptions))
我在执行程序节点上的15/09/02 10:08:29收到以下错误:
“Rscript执行错误:没有这样的文件或目录”。
任何提示都将不胜感激。 除了SparkR之外的Spark任务在我的YARN群集上运行正常。 已安装R 3.2.1并在驱动程序节点上运行正常。
15/09/02 10:04:06 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/09/02 10:04:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/02 10:04:10 INFO spark.SecurityManager: Changing view acls to: yarn,root
15/09/02 10:04:10 INFO spark.SecurityManager: Changing modify acls to: yarn,root
15/09/02 10:04:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); users with modify permissions: Set(yarn, root)
15/09/02 10:04:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/09/02 10:04:12 INFO Remoting: Starting remoting
15/09/02 10:04:12 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@datanode1.hp.com:46167]
15/09/02 10:04:12 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 46167.
15/09/02 10:04:12 INFO spark.SecurityManager: Changing view acls to: yarn,root
15/09/02 10:04:12 INFO spark.SecurityManager: Changing modify acls to: yarn,root
15/09/02 10:04:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); users with modify permissions: Set(yarn, root)
15/09/02 10:04:12 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/09/02 10:04:12 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/09/02 10:04:12 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/09/02 10:04:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/09/02 10:04:12 INFO Remoting: Starting remoting
15/09/02 10:04:13 INFO util.Utils: Successfully started service 'sparkExecutor' on port 47919.
15/09/02 10:04:13 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@datanode1.hp.com:47919]
15/09/02 10:04:13 INFO storage.DiskBlockManager: Created local directory at /data2/hadoop/yarn/local/usercache/root/appcache/application_1441180800595_0001/blockmgr-5e435e40-bd36-4746-9acd-8cf1619033ae
15/09/02 10:04:13 INFO storage.DiskBlockManager: Created local directory at /data3/hadoop/yarn/local/usercache/root/appcache/application_1441180800595_0001/blockmgr-28dfabe6-8e0d-4e49-bc95-27b3428c10a0
15/09/02 10:04:13 INFO storage.MemoryStore: MemoryStore started with capacity 534.5 MB
15/09/02 10:04:13 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver@192.1.1.1:45596/user/CoarseGrainedScheduler
15/09/02 10:04:13 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
15/09/02 10:04:13 INFO executor.Executor: Starting executor ID 2 on host datanode1.hp.com
15/09/02 10:04:14 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34166.
15/09/02 10:04:14 INFO netty.NettyBlockTransferService: Server created on 34166
15/09/02 10:04:14 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/09/02 10:04:14 INFO storage.BlockManagerMaster: Registered BlockManager
15/09/02 10:06:35 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0
15/09/02 10:06:35 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/09/02 10:06:35 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
15/09/02 10:06:35 INFO storage.MemoryStore: ensureFreeSpace(854) called with curMem=0, maxMem=560497950
15/09/02 10:06:35 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 854.0 B, free 534.5 MB)
15/09/02 10:06:35 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 159 ms
15/09/02 10:06:35 INFO storage.MemoryStore: ensureFreeSpace(1280) called with curMem=854, maxMem=560497950
15/09/02 10:06:35 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1280.0 B, free 534.5 MB)
15/09/02 10:06:35 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 11589 bytes result sent to driver
15/09/02 10:08:28 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1
15/09/02 10:08:28 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
15/09/02 10:08:28 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1
15/09/02 10:08:28 INFO storage.MemoryStore: ensureFreeSpace(4022) called with curMem=0, maxMem=560497950
15/09/02 10:08:28 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.9 KB, free 534.5 MB)
15/09/02 10:08:28 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 13 ms
15/09/02 10:08:28 INFO storage.MemoryStore: ensureFreeSpace(9536) called with curMem=4022, maxMem=560497950
15/09/02 10:08:28 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 9.3 KB, free 534.5 MB)
15/09/02 10:08:29 INFO r.BufferedStreamThread: Rscript execution error: No such file or directory
15/09/02 10:08:39 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:425)
at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)