我们可以看到StateStoreRestoreExec如下。
case class StateStoreRestoreExec(
keyExpressions: Seq[Attribute],
stateId: Option[OperatorStateId],
child: SparkPlan)
extends UnaryExecNode with StateStoreReader {
override protected def doExecute(): RDD[InternalRow] = {
val numOutputRows = longMetric("numOutputRows")
child.execute().mapPartitionsWithStateStore(
getStateId.checkpointLocation,
operatorId = getStateId.operatorId,
storeVersion = getStateId.batchId,
keyExpressions.toStructType,
child.output.toStructType,
sqlContext.sessionState,
Some(sqlContext.streams.stateStoreCoordinator)) { case (store, iter) =>
val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output)
iter.flatMap { row =>
val key = getKey(row)
val savedState = store.get(key)
numOutputRows += 1
row +: savedState.toSeq
}
}
在这里,我想知道row +: savedState.toSeq
的含义。我认为row是UnsafeRow的实例,savedState.toSeq是Seq的实例。那么我们如何使用+:
来操作它们。另一方面,我认为savedState是UnsafeRow的实例,toSeq不是UnsaveRow的成员,那么savedState.toSeq
如何工作?
答案 0 :(得分:2)
row
是InternalRow
的实例,savedState
是Option[UnsafeRow]
,其范围为InternalRow
。这里发生的是保存的状态从Option[UnsafeRow]
转换为Seq[UnsafeRow]
,然后row
实例被添加到该序列之前。
当你flatMap
超过这些UnsafeRow
个对象时,会收到Iterator[UnsafeRow]
。