DStreams:在foreachRDD内创建然后在foreachPartition内修改的变量会在foreachPartition之外重设吗?

时间:2019-06-25 00:44:32

标签: apache-spark spark-streaming spark-streaming-kafka

我在kafka中有一堆消息,并使用Spark流处理这些消息。

我正在尝试捕获我的代码未能插入到我的数据库中时的情况,然后接收这些消息并将其重新插入到Kafka中,以便稍后进行处理。

为了解决这个问题,我在我的foreachRDD函数中创建了一个名为“成功”的变量。然后,当我尝试更新到数据库时,我为成功插入返回一个布尔值。我在测试过程中注意到的是,当我尝试在foreachPartition中插入时,这似乎无法正常工作。当我离开foreachPartition函数之外时,似乎成功值会“重置”。

stream: DStream[String]

stream
  .foreachRDD(rdd => {
    if (!rdd.isEmpty()) {
      var success = true
      rdd.foreachPartition(partitionOfRecords => {
        if (partitionOfRecords.nonEmpty) {
          val listOfRecords = partitionOfRecords.toList
          val successfulInsert: Boolean = insertRecordsToDB(listOfRecords)
          logger.info("Insert was successful: " + successfulInsert)
          if (!successfulInsert) {
            logger.info("logging successful as false. Currently its set to: " + success )
            success = false
            logger.info("logged successful as false. Currently its set to: " + success )

          }
        }
      })

      logger.info("Insert into database successful from all partition: " + success)
      if (!success) {
        // send data to Kafka topic
      }

    }
  })

然后我的日志输出显示了此信息!

2019-06-24 20:26:37 [INFO]插入成功:错误 2019-06-24 20:26:37 [INFO]成功记录为false。当前将其设置为:true 2019-06-24 20:26:37 [INFO]成功记录为false。当前将其设置为:false 2019-06-24 20:26:37 [INFO]从所有分区成功插入数据库:true

即使在第三个日志中,它说当前“成功”设置为false,但是当我超出foreachPartition时,我再次对其进行记录,并将其设置为true。

谁能解释为什么?还是建议其他方法?

1 个答案:

答案 0 :(得分:0)

我能够使用累加器使它工作。

stream: DStream[String]

val dbInsertACC = sparkSession.sparkContext.longAccumulator("insertSuccess")

stream
  .foreachRDD(rdd => {
    if (!rdd.isEmpty()) {
      //could maybe put accumulator here?
      rdd.foreachPartition(partitionOfRecords => {
        if (partitionOfRecords.nonEmpty) {
          val listOfRecords = partitionOfRecords.toList
          val successfulInsert: Boolean = insertRecordsToDB(listOfRecords)
          logger.info("Insert was successful: " + successfulInsert)
          if (!successfulInsert) dbInsertACC.add(1)
        }
      })

      logger.info("Insert into database successful from all partition: " + success)
      if (!dbInsertACC.isZero) {
        // send data to Kafka topic
      }

    }
  })