在apache Spark中监控任务

时间:2014-12-26 16:01:51

标签: scala apache-spark

我使用以下命令启动spark master:./ sbin / start-master.sh 如下所述: http://spark.apache.org/docs/latest/spark-standalone.html

然后我提交Spark工作:

sh ./bin/spark-submit \
  --class simplespark.Driver \
  --master spark://`localhost`:7077 \
    C:\\Users\\Adrian\\workspace\\simplespark\\target\\simplespark-0.0.1-SNAPSHOT.jar

如何运行一个演示并行任务运行的简单应用程序?

当我查看http:// localhost:4040 / executors /& http:// localhost:8080 /没有 正在运行的任务:

enter image description here

enter image description here

我正在运行的.jar(simplespark-0.0.1-SNAPSHOT.jar)只包含一个Scala对象:

package simplespark

    import org.apache.spark.SparkContext

    object Driver {

      def main(args: Array[String]) {

        val conf = new org.apache.spark.SparkConf()
          .setMaster("local")
          .setAppName("knn")
          .setSparkHome("C:\\spark-1.1.0-bin-hadoop2.4\\spark-1.1.0-bin-hadoop2.4")
          .set("spark.executor.memory", "2g");

        val sc = new SparkContext(conf);
        val l = List(1)

        sc.parallelize(l)

        while(true){}

      }
    }

更新:当我将--master spark:// localhost :7077 \更改为--master spark://Adrian-PC:7077 \

我可以在Spark UI上看到更新:

enter image description here

我还更新了Driver.scala以读取默认上下文,因为我不确定我是否正确设置它以提交Spark作业:

package simplespark

import org.apache.spark.SparkContext

object Driver {

  def main(args: Array[String]) {

    System.setProperty("spark.executor.memory", "2g")

    val sc = new SparkContext();
    val l = List(1)

    val c = sc.parallelize(List(2, 3, 5, 7)).count()
    println(c)

    sc.stop

  }
}

在Spark控制台上,我收到多个相同的所有相同消息:

14/12/26 20:08:32 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

所以看来Spark工作没有到达主人?

Update2:我开始之后(感谢下面的LomigMégard评论)工人使用:

./bin/spark-class org.apache.spark.deploy.worker.Worker spark://Adrian-PC:7077 

我收到错误:

14/12/27 21:23:52 INFO SparkDeploySchedulerBackend: Executor app-20141227212351-0003/8 removed: java.io.IOException: Cannot run program "C:\cygdrive\c\spark-1.1.0-bin-hadoop2.4\spark-1.1.0-bin-hadoop2.4/bin/compute-classpath.cmd" (in directory "."): CreateProcess error=2, The system cannot find the file specified
14/12/27 21:23:52 INFO AppClient$ClientActor: Executor added: app-20141227212351-0003/9 on worker-20141227211411-Adrian-PC-58199 (Adrian-PC:58199) with 4 cores
14/12/27 21:23:52 INFO SparkDeploySchedulerBackend: Granted executor ID app-20141227212351-0003/9 on hostPort Adrian-PC:58199 with 4 cores, 2.0 GB RAM
14/12/27 21:23:52 INFO AppClient$ClientActor: Executor updated: app-20141227212351-0003/9 is now RUNNING
14/12/27 21:23:52 INFO AppClient$ClientActor: Executor updated: app-20141227212351-0003/9 is now FAILED (java.io.IOException: Cannot run program "C:\cygdrive\c\spark-1.1.0-bin-hadoop2.4\spark-1.1.0-bin-hadoop2.4/bin/compute-classpath.cmd" (in directory "."): CreateProcess error=2, The system cannot find the file specified)
14/12/27 21:23:52 INFO SparkDeploySchedulerBackend: Executor app-20141227212351-0003/9 removed: java.io.IOException: Cannot run program "C:\cygdrive\c\spark-1.1.0-bin-hadoop2.4\spark-1.1.0-bin-hadoop2.4/bin/compute-classpath.cmd" (in directory "."): CreateProcess error=2, The system cannot find the file specified
14/12/27 21:23:52 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
14/12/27 21:23:52 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED
14/12/27 21:23:52 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at Driver.scala:14)
14/12/27 21:23:52 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
Java HotSpot(TM) Client VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

我正在使用Cygwin在Windows上运行脚本。要修复此错误,我将Spark安装复制到cygwin C:\ drive。但后来我收到一个新错误:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Master removed our application: FAILED
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Java HotSpot(TM) Client VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

1 个答案:

答案 0 :(得分:2)

您必须开始实际计算以查看作业。

val c = sc.parallelize(List(2, 3, 5, 7)).count()
println(c)

此处count被称为动作,您至少需要其中一个才能开始工作。您可以找到可用操作列表in the Spark doc

其他方法称为transformations。他们懒洋洋地执行。

请勿忘记使用sc.stop()停止上下文,而不是无限循环。

编辑:对于更新的问题,您为执行程序分配的内存比工作程序中的可用内存多。对于简单的测试,默认值应该没问题。

您还需要将正在运行的工作人员链接到您的主人。请参阅this doc以启动它。

./sbin/start-master.sh
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT