我使用以下命令启动spark master:./ sbin / start-master.sh 如下所述: http://spark.apache.org/docs/latest/spark-standalone.html
然后我提交Spark工作:
sh ./bin/spark-submit \
--class simplespark.Driver \
--master spark://`localhost`:7077 \
C:\\Users\\Adrian\\workspace\\simplespark\\target\\simplespark-0.0.1-SNAPSHOT.jar
如何运行一个演示并行任务运行的简单应用程序?
当我查看http:// localhost
:4040 / executors /& http:// localhost
:8080 /没有
正在运行的任务:
我正在运行的.jar(simplespark-0.0.1-SNAPSHOT.jar)只包含一个Scala对象:
package simplespark
import org.apache.spark.SparkContext
object Driver {
def main(args: Array[String]) {
val conf = new org.apache.spark.SparkConf()
.setMaster("local")
.setAppName("knn")
.setSparkHome("C:\\spark-1.1.0-bin-hadoop2.4\\spark-1.1.0-bin-hadoop2.4")
.set("spark.executor.memory", "2g");
val sc = new SparkContext(conf);
val l = List(1)
sc.parallelize(l)
while(true){}
}
}
更新:当我将--master spark://
localhost :7077 \
更改为--master spark://Adrian-PC:7077 \
时
我可以在Spark UI上看到更新:
我还更新了Driver.scala以读取默认上下文,因为我不确定我是否正确设置它以提交Spark作业:
package simplespark
import org.apache.spark.SparkContext
object Driver {
def main(args: Array[String]) {
System.setProperty("spark.executor.memory", "2g")
val sc = new SparkContext();
val l = List(1)
val c = sc.parallelize(List(2, 3, 5, 7)).count()
println(c)
sc.stop
}
}
在Spark控制台上,我收到多个相同的所有相同消息:
14/12/26 20:08:32 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
所以看来Spark工作没有到达主人?
Update2:我开始之后(感谢下面的LomigMégard评论)工人使用:
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://Adrian-PC:7077
我收到错误:
14/12/27 21:23:52 INFO SparkDeploySchedulerBackend: Executor app-20141227212351-0003/8 removed: java.io.IOException: Cannot run program "C:\cygdrive\c\spark-1.1.0-bin-hadoop2.4\spark-1.1.0-bin-hadoop2.4/bin/compute-classpath.cmd" (in directory "."): CreateProcess error=2, The system cannot find the file specified
14/12/27 21:23:52 INFO AppClient$ClientActor: Executor added: app-20141227212351-0003/9 on worker-20141227211411-Adrian-PC-58199 (Adrian-PC:58199) with 4 cores
14/12/27 21:23:52 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141227212351-0003/9 on hostPort Adrian-PC:58199 with 4 cores, 2.0 GB RAM
14/12/27 21:23:52 INFO AppClient$ClientActor: Executor updated: app-20141227212351-0003/9 is now RUNNING
14/12/27 21:23:52 INFO AppClient$ClientActor: Executor updated: app-20141227212351-0003/9 is now FAILED (java.io.IOException: Cannot run program "C:\cygdrive\c\spark-1.1.0-bin-hadoop2.4\spark-1.1.0-bin-hadoop2.4/bin/compute-classpath.cmd" (in directory "."): CreateProcess error=2, The system cannot find the file specified)
14/12/27 21:23:52 INFO SparkDeploySchedulerBackend: Executor app-20141227212351-0003/9 removed: java.io.IOException: Cannot run program "C:\cygdrive\c\spark-1.1.0-bin-hadoop2.4\spark-1.1.0-bin-hadoop2.4/bin/compute-classpath.cmd" (in directory "."): CreateProcess error=2, The system cannot find the file specified
14/12/27 21:23:52 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
14/12/27 21:23:52 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED
14/12/27 21:23:52 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at Driver.scala:14)
14/12/27 21:23:52 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
Java HotSpot(TM) Client VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
我正在使用Cygwin在Windows上运行脚本。要修复此错误,我将Spark安装复制到cygwin C:\ drive。但后来我收到一个新错误:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Java HotSpot(TM) Client VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
答案 0 :(得分:2)
您必须开始实际计算以查看作业。
val c = sc.parallelize(List(2, 3, 5, 7)).count()
println(c)
此处count
被称为动作,您至少需要其中一个才能开始工作。您可以找到可用操作列表in the Spark doc。
其他方法称为transformations。他们懒洋洋地执行。
请勿忘记使用sc.stop()
停止上下文,而不是无限循环。
编辑:对于更新的问题,您为执行程序分配的内存比工作程序中的可用内存多。对于简单的测试,默认值应该没问题。
您还需要将正在运行的工作人员链接到您的主人。请参阅this doc以启动它。
./sbin/start-master.sh
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT