查找由writeStream操作写入的记录数:SparkListener OnTaskEnd在结构化流中始终返回0

时间:2018-07-25 07:41:18

标签: apache-spark spark-structured-streaming

我想获取writeStream操作写入的记录数。 为此,我有此代码。

spark.sparkContext.addSparkListener(new SparkListener() {
  override def onTaskEnd(taskEnd: SparkListenerTaskEnd) {
    val metrics = taskEnd.taskMetrics
    if(metrics.inputMetrics != None){
      inputRecords += metrics.inputMetrics.recordsRead
    }
    if(metrics.outputMetrics != None) {
      println("OUTPUTMETRICIS NOT NONE")
      recordsWritten += metrics.outputMetrics.recordsWritten
      bytesWritten += metrics.outputMetrics.bytesWritten
    }
    numTasks += 1
    println("recordsWritten = " + recordsWritten)
    println("bytesWritten = " + bytesWritten)
    println("numTasks = " + numTasks)
  }
})

代码进入块中,但值recordWrite字节写入的输入记录始终为0。

编辑:由于有修复,因此已升级到2.3.1。仍为0

Streaming query made progress: {
  "id" : "9c345af0-042c-4eeb-80db-828c5f69e442",
  "runId" : "d309f7cf-624a-42e5-bb54-dfb4fa939228",
  "name" : "WriteToSource",
  "timestamp" : "2018-07-30T14:20:33.486Z",
  "batchId" : 3,
  "numInputRows" : 3511,
  "inputRowsPerSecond" : 2113.786875376279,
  "processedRowsPerSecond" : 3013.733905579399,
  "durationMs" : {
    "addBatch" : 1044,
    "getBatch" : 29,
    "getOffset" : 23,
    "queryPlanning" : 25,
    "triggerExecution" : 1165,
    "walCommit" : 44
  },
  "stateOperators" : [ ],
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[proto2-events-identification-carrier]]",
    "startOffset" : {
      "proto2-events-identification-carrier" : {
        "2" : 22400403,
        "1" : 22313194,
        "0" : 22381260
      }
    },
    "endOffset" : {
      "proto2-events-identification-carrier" : {
        "2" : 22403914,
        "1" : 22313194,
        "0" : 22381260
      }
    },
    "numInputRows" : 3511,
    "inputRowsPerSecond" : 2113.786875376279,
    "processedRowsPerSecond" : 3013.733905579399
  } ],
  "sink" : {
    "description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@1350f304"
  }
}

显示此内容,但我无法通过代码获取它。

1 个答案:

答案 0 :(得分:1)

Spark结构化流的FileStreamSink中有was a bug,该问题已在2.3.1版中修复。

作为一种解决方法,可以在将数据写入接收器之前使用accumulators