我在名为HuaWeiCloud的远程服务器上设置了一个spark-cluster设置(就像AWS一样),我可以使用ssh
登录到master
节点的spark-cluster,我可以运行成功使用spark-submit
的代码。
我的环境:
Scala: 2.11.8
Spark: 2.0.1
Hadoop: 2.7.3
JAVA JDK: 1.8.0_101
master: 172.17.0.4
slave01: 172.17.0.5
slave02: 172.17.0.6
但是我在服务器上开发Spark / Scala代码并不方便。我更喜欢使用IntelliJ在我的本地PC上开发Spark / Scala代码 IDEA,并将其提交给远程spark-cluster。但是,我是学生,我的PC运行在学校局域网(LAN),所以我的PC没有公共网络IP。我的代码是:
import org.apache.spark.{SparkConf, SparkContext}
object SparkPi {
def main(args: Array[String]) {
#the xxx.xxx.xxx.xxx is my remote spark-cluster IP Address
val conf = new SparkConf().setAppName("SparkPi").setMaster("spark://xxx.xxx.xxx.xxx:7077")
.setJars(List("E:\\Workspaces\\IDEAProject\\Test20180421\\out\\artifacts\\Test20180421_jar\\Test20180421.jar")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 4
val n = 100000 * slices
val count = spark.parallelize(1 to n, slices).map { i =>
val x = Math.random * 2 - 1
val y = Math.random * 2 - 1
if (x * x + y * y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}
当我在IDEA上运行程序时,我得到了它们:
"C:\Program Files\Java\jdk1.8.0_101\bin\java" "-javaagent:D:\Program Files\IntelliJ IDEA Community Edition 2017.2.5\lib\idea_rt.jar=56373:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/04/22 16:00:50 INFO SparkContext: Running Spark version 2.0.1
18/04/22 16:00:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/22 16:00:51 INFO SecurityManager: Changing view acls to: Administrator
18/04/22 16:00:51 INFO SecurityManager: Changing modify acls to: Administrator
18/04/22 16:00:51 INFO SecurityManager: Changing view acls groups to:
18/04/22 16:00:51 INFO SecurityManager: Changing modify acls groups to:
18/04/22 16:00:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); groups with view permissions: Set(); users with modify permissions: Set(Administrator); groups with modify permissions: Set()
18/04/22 16:00:53 INFO Utils: Successfully started service 'sparkDriver' on port 56394.
18/04/22 16:00:53 INFO SparkEnv: Registering MapOutputTracker
18/04/22 16:00:53 INFO SparkEnv: Registering BlockManagerMaster
18/04/22 16:00:53 INFO DiskBlockManager: Created local directory at C:\Users\Administrator\AppData\Local\Temp\blockmgr-49b3fde8-dedd-473f-98cd-84c0fcfdaa8c
18/04/22 16:00:53 INFO MemoryStore: MemoryStore started with capacity 1170.6 MB
18/04/22 16:00:54 INFO SparkEnv: Registering OutputCommitCoordinator
18/04/22 16:00:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/04/22 16:00:54 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://169.254.39.142:4040
18/04/22 16:00:54 INFO SparkContext: Added JAR E:\Workspaces\IDEAProject\Test20180421\out\artifacts\Test20180421_jar\Test20180421.jar at spark://169.254.39.142:56394/jars/Test20180421.jar with timestamp 1524384054718
18/04/22 16:00:55 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://xxx.xxx.xxx.xxx:7077...
18/04/22 16:00:55 INFO TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xxx:7077 after 147 ms (0 ms spent in bootstraps)
18/04/22 16:00:56 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20180422160059-0013
18/04/22 16:00:56 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180422160059-0013/0 on worker-20180421200741-172.17.0.6-33453 (172.17.0.6:33453) with 2 cores
18/04/22 16:00:56 INFO StandaloneSchedulerBackend: Granted executor ID app-20180422160059-0013/0 on hostPort 172.17.0.6:33453 with 2 cores, 1024.0 MB RAM
18/04/22 16:00:56 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180422160059-0013/1 on worker-20180421200741-172.17.0.5-43663 (172.17.0.5:43663) with 2 cores
18/04/22 16:00:56 INFO StandaloneSchedulerBackend: Granted executor ID app-20180422160059-0013/1 on hostPort 172.17.0.5:43663 with 2 cores, 1024.0 MB RAM
18/04/22 16:00:56 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180422160059-0013/0 is now RUNNING
18/04/22 16:00:56 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180422160059-0013/1 is now RUNNING
18/04/22 16:00:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56422.
18/04/22 16:00:56 INFO NettyBlockTransferService: Server created on 169.254.39.142:56422
18/04/22 16:00:56 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 169.254.39.142, 56422)
18/04/22 16:00:56 INFO BlockManagerMasterEndpoint: Registering block manager 169.254.39.142:56422 with 1170.6 MB RAM, BlockManagerId(driver, 169.254.39.142, 56422)
18/04/22 16:00:56 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 169.254.39.142, 56422)
18/04/22 16:00:56 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
18/04/22 16:00:57 INFO SparkContext: Starting job: reduce at SparkPi.scala:14
18/04/22 16:00:57 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:14) with 4 output partitions
18/04/22 16:00:57 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:14)
18/04/22 16:00:57 INFO DAGScheduler: Parents of final stage: List()
18/04/22 16:00:57 INFO DAGScheduler: Missing parents: List()
18/04/22 16:00:57 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:10), which has no missing parents
18/04/22 16:00:57 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1776.0 B, free 1170.6 MB)
18/04/22 16:00:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1157.0 B, free 1170.6 MB)
18/04/22 16:00:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 169.254.39.142:56422 (size: 1157.0 B, free: 1170.6 MB)
18/04/22 16:00:58 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
18/04/22 16:00:58 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:10)
18/04/22 16:00:58 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
这个日志继续打印,没有进一步的动作。我反复得到它们:
18/04/22 16:01:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/04/22 16:01:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/04/22 16:01:43 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/04/22 16:01:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/04/22 16:02:13 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/04/22 16:02:28 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/04/22 16:02:43 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/04/22 16:02:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/04/22 16:02:58 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180422160059-0013/0 is now EXITED (Command exited with code 1)
18/04/22 16:02:58 INFO StandaloneSchedulerBackend: Executor app-20180422160059-0013/0 removed: Command exited with code 1
18/04/22 16:02:58 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180422160059-0013/2 on worker-20180421200741-172.17.0.6-33453 (172.17.0.6:33453) with 2 cores
18/04/22 16:02:58 INFO StandaloneSchedulerBackend: Granted executor ID app-20180422160059-0013/2 on hostPort 172.17.0.6:33453 with 2 cores, 1024.0 MB RAM
18/04/22 16:02:58 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180422160059-0013/2 is now RUNNING
18/04/22 16:02:58 INFO BlockManagerMaster: Removal of executor 0 requested
18/04/22 16:02:58 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 0
18/04/22 16:02:58 INFO BlockManagerMasterEndpoint: Trying to remove executor 0 from BlockManagerMaster.
18/04/22 16:02:58 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180422160059-0013/1 is now EXITED (Command exited with code 1)
18/04/22 16:02:58 INFO StandaloneSchedulerBackend: Executor app-20180422160059-0013/1 removed: Command exited with code 1
18/04/22 16:02:58 INFO BlockManagerMaster: Removal of executor 1 requested
18/04/22 16:02:58 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
18/04/22 16:02:58 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 1
18/04/22 16:02:58 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20180422160059-0013/3 on worker-20180421200741-172.17.0.5-43663 (172.17.0.5:43663) with 2 cores
18/04/22 16:02:58 INFO StandaloneSchedulerBackend: Granted executor ID app-20180422160059-0013/3 on hostPort 172.17.0.5:43663 with 2 cores, 1024.0 MB RAM
18/04/22 16:02:58 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20180422160059-0013/3 is now RUNNING
在Spark-Web UI中,我可以看到,一个应用程序正在运行: The Spark-web UI
然后我点击正在运行的应用程序,日志是:
18/04/22 16:15:14 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: 4547@slave01
18/04/22 16:15:14 INFO util.SignalUtils: Registered signal handler for TERM
18/04/22 16:15:14 INFO util.SignalUtils: Registered signal handler for HUP
18/04/22 16:15:14 INFO util.SignalUtils: Registered signal handler for INT
18/04/22 16:15:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/04/22 16:15:15 INFO spark.SecurityManager: Changing view acls to: spark,Administrator
18/04/22 16:15:15 INFO spark.SecurityManager: Changing modify acls to: spark,Administrator
18/04/22 16:15:15 INFO spark.SecurityManager: Changing view acls groups to:
18/04/22 16:15:15 INFO spark.SecurityManager: Changing modify acls groups to:
18/04/22 16:15:15 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, Administrator); groups with view permissions: Set(); users with modify permissions: Set(spark, Administrator); groups with modify permissions: Set()
18/04/22 16:15:18 WARN internal.ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:174)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:270)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
at scala.util.Try$.apply(Try.scala:192)
at scala.util.Failure.recover(Try.scala:216)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
... 8 more
java.lang.IllegalArgumentException: requirement failed: TransportClient has not yet been set.
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.rpc.netty.RpcOutboxMessage.onTimeout(Outbox.scala:70)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$ask$1.applyOrElse(NettyRpcEnv.scala:232)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$ask$1.applyOrElse(NettyRpcEnv.scala:231)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:138)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我认为java.lang.IllegalArgumentException: requirement failed: TransportClient has not yet been set.
会导致问题。我该怎么办?