使用spark-jobserver提交spark作业时出错

时间:2016-12-13 12:26:01

标签: apache-spark spark-jobserver

我在提交工作时偶尔会面临以下错误。如果我删除了fieldao,datadao和sqldao的rootdir,这个错误就消失了。这意味着我必须重新启动作业服务器并重新上传我的jar。

{
  "status": "ERROR",
  "result": {
    "message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/1995aeba-com.spmsoftware.distributed.job.TestJob#-1370794810]] after [10000 ms]. Sender[null] sent message of type \"spark.jobserver.JobManagerActor$StartJob\".",
    "errorClass": "akka.pattern.AskTimeoutException",
    "stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)", "akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)", "scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)", "scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:331)", "akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:282)", "akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:286)", "akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:238)", "java.lang.Thread.run(Thread.java:745)"]
  }
}

我的配置文件如下:

# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
# Spark Cluster / Job Server configuration
spark {
  # spark.master will be passed to each job's JobContext
  master = <spark_master>

  # Default # of CPUs for jobs to use for Spark standalone cluster
  job-number-cpus = 4

  jobserver {
    port = 8090

    context-per-jvm = false
    context-creation-timeout = 100 s
    # Note: JobFileDAO is deprecated from v0.7.0 because of issues in
    # production and will be removed in future, now defaults to H2 file.
    jobdao = spark.jobserver.io.JobSqlDAO

    filedao {
      rootdir = /tmp/spark-jobserver/filedao/data
    }

    datadao {
      rootdir = /tmp/spark-jobserver/upload
    }

    sqldao {
      slick-driver = slick.driver.H2Driver

      jdbc-driver = org.h2.Driver

      rootdir = /tmp/spark-jobserver/sqldao/data

      jdbc {
        url = "jdbc:h2:file:/tmp/spark-jobserver/sqldao/data/h2-db"
        user = ""
        password = ""
      }

      dbcp {
        enabled = false
        maxactive = 20
        maxidle = 10
        initialsize = 10
      }
    }
    result-chunk-size = 1m
    short-timeout = 60 s    
  }

  context-settings {
    num-cpu-cores = 2           # Number of cores to allocate.  Required.
    memory-per-node = 512m         # Executor memory per node, -Xmx style eg 512m, #1G, etc.

  }

}

akka {
  remote.netty.tcp {
    # This controls the maximum message size, including job results, that can be sent
    # maximum-frame-size = 200 MiB
  }
}

# check the reference.conf in spray-can/src/main/resources for all defined settings
spray.can.server.parsing.max-content-length = 250m

我正在使用spark-2.0-preview版本。

1 个答案:

答案 0 :(得分:0)

之前我遇到过同样的错误并且与超时有关,当然是同步请求(sync = true)要求你必须提供超时(以秒为单位),这是一个值相对于处理你需要多长时间的值请求。

这个例子的示例如下:

curl -k --basic -d '' 'http://localhost:5050/jobs?appName=app&classPath=Main&context=test-context&sync=true&timeout=40'

如果您的请求需要超过40秒,您可能还需要修改

上的application.conf
spark-jobserver-master/job-server/src/main/resources/application.conf

单击 spray.can.server 部分修改:

idle-timeout = 210 s
request-timeout = 200 s