如何在spark-jobserver

时间:2017-02-17 13:38:54

标签: apache-spark spark-jobserver

我试图在本地执行spark-jobserver中的作业。我的应用程序具有以下依赖项:

name := "spark-test"

version := "1.0"

scalaVersion := "2.10.6"

resolvers += Resolver.jcenterRepo

libraryDependencies += "org.apache.spark"  %%  "spark-core"  %  "1.6.1"
libraryDependencies += "spark.jobserver"  %%  "job-server-api" % "0.6.2" % "provided"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.6.2"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.6.2"
libraryDependencies += "com.holdenkarau" % "spark-testing-base_2.10" % "1.6.2_0.4.7" % "test"

我使用以下方式生成了应用程序包:

sbt assembly

之后,我已经提交了这样的软件包:

curl --data-binary @spark-test-assembly-1.0.jar localhost:8090/jars/myApp

当我触发作业时,我收到以下错误:

{
  "duration": "0.101 secs",
  "classPath": "jobs.TransformationJob",
  "startTime": "2017-02-17T13:01:55.549Z",
  "context": "42f857ba-jobs.TransformationJob",
  "result": {
    "message": "java.lang.Exception: Could not find resource path for Web UI: org/apache/spark/sql/execution/ui/static",
    "errorClass": "java.lang.RuntimeException",
    "stack": ["org.apache.spark.ui.JettyUtils$.createStaticHandler(JettyUtils.scala:180)", "org.apache.spark.ui.WebUI.addStaticHandler(WebUI.scala:117)", "org.apache.spark.sql.execution.ui.SQLTab.<init>(SQLTab.scala:34)", "org.apache.spark.sql.SQLContext$$anonfun$createListenerAndUI$1.apply(SQLContext.scala:1369)", "org.apache.spark.sql.SQLContext$$anonfun$createListenerAndUI$1.apply(SQLContext.scala:1369)", "scala.Option.foreach(Option.scala:236)", "org.apache.spark.sql.SQLContext$.createListenerAndUI(SQLContext.scala:1369)", "org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:77)", "jobs.TransformationJob$.runJob(TransformationJob.scala:64)", "jobs.TransformationJob$.runJob(TransformationJob.scala:14)", "spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:301)", "scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)", "scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)", "java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)", "java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)", "java.lang.Thread.run(Thread.java:745)"]
  },
  "status": "ERROR",
  "jobId": "a6bd6f23-cc82-44f3-8179-3b68168a2aa7"
}

以下是失败的应用程序部分:

override def runJob(sparkCtx: SparkContext, config: Config): Any = {
    val sqlContext = new SQLContext(sparkCtx)
    ...
}

我有一些问题:

1)我注意到要运行spark-jobserver local我不需要安装spark。 spark-jobserver是否已经嵌入了火花?

2)我如何知道spark-jobserver正在使用的spark版本是什么?那是哪里?

3)我正在使用spark-sql的1.6.2版本。我应该改变它还是保留它?

如果有人能回答我的问题,我将非常感激。

1 个答案:

答案 0 :(得分:1)

  1. 是的,spark-jobserver有火花依赖性。您应该使用job-server-extras / reStart代替job-server / reStart,它将帮助您获得与sql相关的依赖项。
  2. 查看project / Versions.scala
  3. 我认为你不需要spark-sql,因为如果你运行job-server-extras / reStart就包含它了