“row +:savedState.toSeq”在StateStoreRestoreExec.doExecute中做了什么?

时间:2017-08-22 14:59:25

标签: scala apache-spark spark-structured-streaming

我们可以看到StateStoreRestoreExec如下。

case class StateStoreRestoreExec(
    keyExpressions: Seq[Attribute],
    stateId: Option[OperatorStateId],
    child: SparkPlan)
  extends UnaryExecNode with StateStoreReader {

  override protected def doExecute(): RDD[InternalRow] = {
    val numOutputRows = longMetric("numOutputRows")

  child.execute().mapPartitionsWithStateStore(
    getStateId.checkpointLocation,
    operatorId = getStateId.operatorId,
    storeVersion = getStateId.batchId,
    keyExpressions.toStructType,
    child.output.toStructType,
    sqlContext.sessionState,
    Some(sqlContext.streams.stateStoreCoordinator)) { case (store, iter) =>
      val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output)
      iter.flatMap { row =>
        val key = getKey(row)
        val savedState = store.get(key)
        numOutputRows += 1
        row +: savedState.toSeq
      }
}

在这里,我想知道row +: savedState.toSeq的含义。我认为row是UnsafeRow的实例,savedState.toSeq是Seq的实例。那么我们如何使用+:来操作它们。另一方面,我认为savedState是UnsafeRow的实例,toSeq不是UnsaveRow的成员,那么savedState.toSeq如何工作?

1 个答案:

答案 0 :(得分:2)

rowInternalRow的实例,savedStateOption[UnsafeRow],其范围为InternalRow。这里发生的是保存的状态从Option[UnsafeRow]转换为Seq[UnsafeRow],然后row实例被添加到该序列之前。

当你flatMap超过这些UnsafeRow个对象时,会收到Iterator[UnsafeRow]