哪些操作有助于Spark任务反序列化时间?

时间:2016-02-23 20:00:53

标签: apache-spark

我有一些工作,其中任务由任务反序列化时间决定。在任务反序列化3分钟后,任务本身在大约10秒内完成。

此指标的确切界限是什么?哪些资源限制最常导致长反序列化时间?

1 个答案:

答案 0 :(得分:4)

快速了解master(https://github.com/kayousterhout/spark-1/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L179

上的源代码

基本上就是这样:

    val (taskFiles, taskJars, taskBytes) = Task.deserializeWithDependencies(serializedTask)
    updateDependencies(taskFiles, taskJars)
    task = ser.deserialize[Task[Any]](taskBytes, Thread.currentThread.getContextClassLoader)

    // If this task has been killed before we deserialized it, let's quit now. Otherwise,
    // continue executing the task.
    if (killed) {
      // Throw an exception rather than returning, because returning within a try{} block
      // causes a NonLocalReturnControl exception to be thrown. The NonLocalReturnControl
      // exception will be caught by the catch block, leading to an incorrect ExceptionFailure
      // for the task.
      throw new TaskKilledException
    }

    attemptedTask = Some(task)
    logDebug("Task " + taskId + "'s epoch is " + task.epoch)
    env.mapOutputTracker.updateEpoch(task.epoch)

从这一行(taskFiles, taskJars, taskBytes)我怀疑每项任务都是对JAR进行反序列化;在我的情况下,我有一个136 MB的脂肪JAR,没有帮助。