每次调用未在while循环中调用的函数

时间:2017-12-20 22:21:19

标签: scala asynchronous future

我正在构建一个实用程序,它监视并保持在较大系统中处理的文件的进度。该文件是一个大的“文本”文件,.csv,.xls,.txt等。这可能是来自Kafka的流数据,将其写入Avro,或者批量编写SQL DB。我正在尝试构建一个“catchall”实用程序,它记录处理的行数,并使用RESTful API调用将进度持久保存到数据库。

无论处理类型如何,处理始终在Akka Actor中完成。我正在尝试异步进行进度记录,以避免阻止处理进度。进展非常迅速。大多数情况都是以类似的批处理风格格式发生的,尽管有时它会逐渐递增,这里只是为了演示而在处理过程中会发生什么的基本表示:

//inside my processing actor

  var fileIsProcessing = true
  val allLines = KafkaUtil.getConnect(fileKey)
  val totalLines = KafkaUtil.getSize
  val batchSize = 500
  val dBUtil = new DBUtil(totalLines)

 while (fileIsProcessing) {

    // consumes @ 500 lines at a time to process, returns empty if done consuming
    val batch:List[Pollable] = allLines.poll
    //for batch  identification purposes
    val myMax = batch.map(_.toInt ).max
    println("Starting new batch with max line: " + myMax)

    //processing work happens here
    batch.map(processSync)
    println("Finished processing batch with max line: " + myMax)

    //send a progress update to be persisted to the DB
    val progressCall = Future[Unit] {dBUtil.incrementProgress(batch.size)}
    progressCall.onComplete{
          case Success(s) => // don't care
          case Failure(e) => logger.error("Unable to persist progress from actor ") 
    }

 if (batch.isEmpty) fileIsProcessing = false //this is horribly non-functional.
}

并且,我的DBUtil的简单表示,即进行处理的类:

class DBUtil(totalLines:Int) {

    //store both the number processed and the total to process in db, even if there is currently a percentage

var rate = 0 //lines per second
var totalFinished = 0
var percentageFin:Double = 0
var lastUpdate = DateTime.now()

def incrementProgress(totalProcessed: Int, currentTime:DateTime): Unit = {
  //simulate write the data and calculated progress percentage to db
  rate = totalProcessed/((currentTime.getMillis() - lastUpdate.getMillis())/1000)
  totalFinished += totalProcessed
  percentageFin = (totalFinished.toDouble / totalLines.toDouble) * 100
  println(s"Simulating DB persist of total processed:$totalFinished lines at $percentageFin% from my total lines: $totalLines at rate:$rate" )
}

}

现在,真正奇怪的是,在生产中,处理发生得如此之快,以至于每次都不能可靠地调用行Future[Unit] { dBUtil.incrementProgress(batch.size)}while循环将完成,但我会在我的数据库中注意到进度将挂起50%或80%。唯一可行的方法是,如果我使用loggerprintln语句阻塞系统,以减慢速度。

为什么我的Future呼叫每次都不能可靠地呼叫?

1 个答案:

答案 0 :(得分:1)

嗯......所以你的代码几乎没有问题,

您只是在while循环中启动期货,然后循环进行下一次迭代,而无需等待未来完成。这意味着您的程序可能会在执行者实际执行期货之前完成。

此外,您的循环正在创建越来越多的“未来”调用dBUtil.incrementProgress(batch.size),您将有多个线程同时执行相同的功能。当您使用可变状态时,这将导致竞争条件。

def processFileWithIncrementalUpdates(
  allLines: ????,
  totalLines: Int,
  batchSize: Int,
  dbUtil: DBUtil
): Future[Unit] = {
  val promise = Promise[Unit]()
  Future {
    val batch: List[Pollable] = allLines.poll
    if (batch.isEmpty) {
      promise.completeWith(Future.successful[Unit]())
    }
    else {
      val myMax = batch.map(_.toInt).max
      println("Starting new batch with max line: " + myMax)

      //processing work happens here
      batch.map(processSync)
      println("Finished processing batch with max line: " + myMax)

      //send a progress update to be persisted to the DB
      val progressCall = Future[Unit] { dBUtil.incrementProgress(batch.size) }

      progressCall.onComplete{
        case Success(s) => // don't care
        case Failure(e) => logger.error("Unable to persist progress from actor ")
      }

      progressCall.onComplete({
        case _ => promise.completeWith(processFileWithIncrementalUpdates(allLines, totalLines, batchSize, dBUtil))
      })
    }
    promise.future
  }
}

val allLines = KafkaUtil.getConnect(fileKey)
val totalLines = KafkaUtil.getSize
val batchSize = 500
val dBUtil = new DBUtil(totalLines)

val processingFuture = processFileWithIncrementalUpdates(allLines, totalLines, batchSize, dBUtil)