以下是spark的Stage类的片段代码,它定义了jobIds
,它是HashSet
,因此这意味着一个阶段可以属于不同的作业。我一直认为一个阶段只能属于一个工作,有人可以展示一些示例代码来证明这一点吗?
private[scheduler] abstract class Stage(
val id: Int,
val rdd: RDD[_],
val numTasks: Int,
val parents: List[Stage],
val firstJobId: Int,
val callSite: CallSite)
extends Logging {
val numPartitions = rdd.partitions.length
/** Set of jobs that this stage belongs to. */
val jobIds = new HashSet[Int]
/** The ID to use for the next new attempt for this stage. */
private var nextAttemptId: Int = 0
val name: String = callSite.shortForm
val details: String = callSite.longForm
}