Question

我正在实现HTTP资源的迭代器，我可以恢复分页的元素列表，我尝试用普通的Iterator来做这个，但它是一个阻塞实现，因为我正在使用akka这让我的调度员变得有点疯狂。

我将使用akka-stream实现相同的迭代器。问题是我需要有点不同的重试策略。

该服务返回由id标识的元素列表，有时当我查询下一页时，该服务会在当前页面上返回相同的产品。

我目前的算法是这样的。

var seenIds = Set.empty
var position = 0

def isProblematicPage(elements: Seq[Element]) Boolean = {
  val currentIds = elements.map(_.id)
  val intersection = seenIds & currentIds
  val hasOnlyNewIds = intersection.isEmpty
  if (hasOnlyNewIds) {
    seenIds = seenIds | currentIds
  }
  !hasOnlyNewIds
}

def incrementPage(): Unit = {
  position += 10
}

def doBackOff(attempt: Int): Unit = {
  // Backoff logic
}

@tailrec
def fetchPage(attempt: Int = 0): Iterator[Element] = {
  if (attempt > MaxRetries) {
    incrementPage()
    return Iterator.empty
  } 

  val eventualPage = service.retrievePage(position, position + 10)

  val page = Await.result(eventualPage, 5 minutes)

  if (isProblematicPage(page)) {
    doBackOff(attempt)
    fetchPage(attempt + 1)
  } else {
    incrementPage()
    page.iterator
  }
}

我正在使用akka-streams进行实现，但我无法弄清楚如何累积页面并使用流结构测试重复。

有什么建议吗？

Answer 1

Flow.scan方法在这种情况下很有用。

我会用一个职位来源开始你的流：

type Position = Int

//0,10,20,...
def positionIterator() : Iterator[Position] = Iterator from (0,10) 

val positionSource : Source[Position,_] = Source fromIterator positionIterator

然后可以将此位置来源定向到Flow.scan，该fetchPage使用与您的def fetchPageWithState(service : Service) (seenEls : Set[Element], position : Position) : Set[Elements] = { val maxRetries = 10 val seenIds = seenEls map (_.id) @tailrec def readPosition(attempt : Int) : Seq[Elements] = { if(attempt > maxRetries) Iterator.empty else { val eventualPage : Seq[Element] = Await.result(service.retrievePage(position, position + 10), 5 minutes) if(eventualPage.map(_.id).exists(seenIds.contains)) { doBackOff(attempt) readPosition(attempt + 1) } else eventualPage } }//end def readPosition seenEls ++ readPosition(0).toSet }//end def fetchPageWithState类似的功能（旁注：您应该尽可能避免等待，有一种方法可以等待在您的代码中，但这超出了原始问题的范围）。新功能需要采用＆＃34;状态＆＃34;已经看过的元素：

Flow

现在可以在def fetchFlow(service : Service) : Flow[Position, Set[Element],_] = Flow[Position].scan(Set.empty[Element])(fetchPageWithState(service))：

中使用

Set[Element]

新的Flow可轻松连接到您的位置来源，以创建def elementsSource(service : Service) : Source[Set[Element], _] = positionSource via fetchFlow(service)的来源：

elementsSource

来自library(data.table) set.seed(123) rho = cor(dt$m,dt$r,'pairwise') # calculate linear regression of original data fit1 = lm(r ~ m, data=dt) fit2 = lm(m ~ r, data=dt) # extract the standard errors of regression intercept (in each m & r direction) # and multiply s.e. by sqrt(n) to get standard deviation sd1 = summary(fit1)$coefficients[1,2] * sqrt(dt[!is.na(r), .N]) sd2 = summary(fit2)$coefficients[1,2] * sqrt(dt[!is.na(m), .N]) # find where data points with missing values lie on the regression line dt[is.na(r), r.imp := coefficients(fit1)[1] + coefficients(fit1)[2] * m] dt[is.na(m), m.imp := coefficients(fit2)[1] + coefficients(fit2)[2] * r] # generate randomised residuals for the missing data, using the s.d. calculated above dt[is.na(r), r.ran := rnorm(.N, sd=sd1)] dt[is.na(m), m.ran := rnorm(.N, sd=sd2)] # function that scales the residuals by a factor x, then calculates how close correlation of imputed data is to that of original data obj = function(x, dt, rho) { dt[, r.comp := r][, m.comp := m] dt[is.na(r), r.comp := r.imp + r.ran*x] dt[is.na(m), m.comp := m.imp + m.ran*x] rho2 = cor(dt$m.comp, dt$r.comp,'pairwise') (rho-rho2)^2 } # find the value of x that minimises the discrepencay of imputed versus original correlation fit = optimize(obj, c(-5,5), dt, rho) x=fit$minimum dt[, r.comp := r][, m.comp := m] dt[is.na(r), r.comp := r.imp + r.ran*x] dt[is.na(m), m.comp := m.imp + m.ran*x] rho2 = cor(dt$m.comp, dt$r.comp,'pairwise') (rho-rho2)^2 # check that rho2 is approximately equal to rho的每个新值都将成为来自获取页面的不断增长的唯一元素集。

Answer 2

2.4.12阶段是一个很好的建议，但它缺乏处理未来的功能，因此我实现了异步版本Flow.scanAsync，现在可以在akka val service: WebService val maxTries: Int val backOff: FiniteDuration def retry[T](zero: T, attempt: Int = 0)(f: => Future[T]): Future[T] = { f.recoverWith { case ex if attempt >= maxAttempts => Future(zero) case ex => akka.pattern.after(backOff, system.scheduler)(retry(zero, attempt + 1)(f)) } } def isProblematicPage(lastPage: Seq[Element], currPage: Seq[Element]): Boolean = { val lastPageIds = lastPage.map(_.id).toSet val currPageIds = currPage.map(_.id).toSet val intersection = lastPageIds & currPageIds intersection.nonEmpty } def retrievePage(lastPage: Seq[Element], startIndex: Int): Future[Seq[Element]] = { retry(Seq.empty) { service.fetchPage(startIndex).map { currPage: Seq[Element] => if (isProblematicPage(lastPage, currPage)) throw new ProblematicPageException(startIndex) else currPage } } } val pagesRange: Range = Range(0, maxItems, pageSize) val scanAsyncFlow = Flow[Int].via(ScanAsync(Seq.empty)(retrievePage)) Source(pagesRange) .via(scanAsyncFlow) .mapConcat(identity) .runWith(Sink.seq)上使用。

目前的实施是：

{{1}}

感谢Ramon的建议：）

Akka流重试重复结果

2 个答案: