对具有有限并行性的Scala期货进行排序(无需处理ExecutorContexts)

时间:2014-11-23 02:50:38

标签: scala future rx-java

背景:我有一个功能:

  def doWork(symbol: String): Future[Unit]

启动一些副作用以获取数据并存储它,并在完成时完成Future。但是,后端基础结构具有使用限制,因此可以并行地生成不超过5个这些请求。我有一个N符号列表,我需要通过:

  var symbols = Array("MSFT",...)

但是我想对它们进行排序,使得不超过5个同时执行。给出:

  val allowableParallelism = 5

我目前的解决方案是(假设我正在使用async / await):

  val symbolChunks = symbols.toList.grouped(allowableParallelism).toList
  def toThunk(x: List[String]) = () => Future.sequence(x.map(doWork))
  val symbolThunks = symbolChunks.map(toThunk)
  val done = Promise[Unit]()
  def procThunks(x: List[() => Future[List[Unit]]]): Unit = x match {
    case Nil => done.success()
    case x::xs => x().onComplete(_ => procThunks(xs))
  }
  procThunks(symbolThunks)
  await { done.future }

但是,由于显而易见的原因,我对它并不十分满意。我觉得这应该是可能的折叠,但每次我尝试,我最终热切地创造期货。我还使用了concatMap尝试了一个带有RxScala Observables的版本,但这看起来也有些过分。

有没有更好的方法来实现这一目标?

4 个答案:

答案 0 :(得分:5)

我有一些示例如何使用scalaz-stream。它是相当多的代码,因为它需要将scala Future转换为scalaz Task(延迟计算的抽象)。但是需要将它添加到项目一次。另一种选择是使用Task来定义'doWork'。我个人更喜欢构建异步程序的任务。

  import scala.concurrent.{Future => SFuture}
  import scala.util.Random
  import scala.concurrent.ExecutionContext.Implicits.global


  import scalaz.stream._
  import scalaz.concurrent._

  val P = scalaz.stream.Process

  val rnd = new Random()

  def doWork(symbol: String): SFuture[Unit] = SFuture {
    Thread.sleep(rnd.nextInt(1000))
    println(s"Symbol: $symbol. Thread: ${Thread.currentThread().getName}")
  }

  val symbols = Seq("AAPL", "MSFT", "GOOGL", "CVX").
    flatMap(s => Seq.fill(5)(s).zipWithIndex.map(t => s"${t._1}${t._2}"))

  implicit class Transformer[+T](fut: => SFuture[T]) {
    def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
      import scala.util.{Failure, Success}
      import scalaz.syntax.either._
      Task.async {
        register =>
          fut.onComplete {
            case Success(v) => register(v.right)
            case Failure(ex) => register(ex.left)
          }
      }
    }
  }

  implicit class ConcurrentProcess[O](val process: Process[Task, O]) {
    def concurrently[O2](concurrencyLevel: Int)(f: Channel[Task, O, O2]): Process[Task, O2] = {
      val actions =
        process.
          zipWith(f)((data, f) => f(data))

      val nestedActions =
        actions.map(P.eval)

      merge.mergeN(concurrencyLevel)(nestedActions)
    }
  }

  val workChannel = io.channel((s: String) => doWork(s).toTask)

  val process = Process.emitAll(symbols).concurrently(5)(workChannel)

  process.run.run

当你在范围内进行所有这些转变时,基本上你只需要:

  val workChannel = io.channel((s: String) => doWork(s).toTask)

  val process = Process.emitAll(symbols).concurrently(5)(workChannel)

非常简短且自我描述

答案 1 :(得分:3)

虽然你已经得到了很好的答案,但我想我仍然可以就这些问题提出一两个意见。

我记得在某个地方(在某人的博客上)“使用演员来表示国家并使用期货进行并发”。

所以我的第一个想法是以某种方式利用演员。确切地说,我会有一个主人,路由器启动多个工人演员,根据allowableParallelism限制工人数量。所以,假设我有

def doWorkInternal (symbol: String): Unit

你的工作doWork取得了'未来之外',我会有这样的东西(很简陋,没有考虑很多细节,实际上从akka文档中复制代码):

import akka.actor._

case class WorkItem (symbol: String)
case class WorkItemCompleted (symbol: String)
case class WorkLoad (symbols: Array[String])
case class WorkLoadCompleted ()

class Worker extends Actor  {
    def receive = {
        case WorkItem (symbol) =>
            doWorkInternal (symbol)
            sender () ! WorkItemCompleted (symbol)
    }
}

class Master extends Actor  {
    var pending = Set[String] ()
    var originator: Option[ActorRef] = None

    var router = {
        val routees = Vector.fill (allowableParallelism) {
            val r = context.actorOf(Props[Worker])
            context watch r
            ActorRefRoutee(r)
        }
        Router (RoundRobinRoutingLogic(), routees)
    }

    def receive = {
        case WorkLoad (symbols) =>
            originator = Some (sender ())
            context become processing
            for (symbol <- symbols) {
                router.route (WorkItem (symbol), self)
                pending += symbol
            }
    }

    def processing: Receive = {
        case Terminated (a) =>
            router = router.removeRoutee(a)
            val r = context.actorOf(Props[Worker])
            context watch r
            router = router.addRoutee(r)
        case WorkItemCompleted (symbol) =>
            pending -= symbol
            if (pending.size == 0) {
                context become receive
                originator.get ! WorkLoadCompleted
            }
    }
}

您可以使用ask查询主演员,并在将来收到WorkLoadCompleted

但是更多地考虑隐藏在某处的“状态”(处理中的同时请求数量),以及实现不超过它的必要代码,如果你不这样做,这就是'未来网关中介'的类型注意命令式风格和可变性(仅在内部使用)结构:

object Guardian
{
    private val incoming = new collection.mutable.HashMap[String, Promise[Unit]]()
    private val outgoing = new collection.mutable.HashMap[String, Future[Unit]]()
    private val pending = new collection.mutable.Queue[String]

    def doWorkGuarded (symbol: String): Future[Unit] = {
        synchronized {
            val p = Promise[Unit] ()
            incoming(symbol) = p
            if (incoming.size <= allowableParallelism)
                launchWork (symbol)
            else
                pending.enqueue (symbol)
            p.future
        }
    }

    private def completionHandler (t: Try[Unit]): Unit = {
        synchronized {
            for (symbol <- outgoing.keySet) {
                val f = outgoing (symbol)
                if (f.isCompleted) {
                    incoming (symbol).completeWith (f)
                    incoming.remove (symbol)
                    outgoing.remove (symbol)
                }
            }
            for (i <- outgoing.size to allowableParallelism) {
                if (pending.nonEmpty) {
                    val symbol = pending.dequeue()
                    launchWork (symbol)
                }
            }
        }
    }

    private def launchWork (symbol: String): Unit = {
        val f = doWork(symbol)
        outgoing(symbol) = f
        f.onComplete(completionHandler)
    }
}

doWork现在与您的完全一样,返回Future[Unit],并认为不是使用类似

的内容
val futures = symbols.map (doWork (_)).toSeq
val future = Future.sequence(futures)

将根据allowableParallelism推出期货,我会改为使用

val futures = symbols.map (Guardian.doWorkGuarded (_)).toSeq
val future = Future.sequence(futures)

考虑一些具有非阻塞接口的假设数据库访问驱动程序,即在请求中返回期货,例如,通过在某个连接池上构建来限制并发性 - 您不希望它返回未处于并行级别的期货考虑到这一点,并要求你与他们玩耍以控制并行性。

这个例子比实际更具说明性,因为我通常不会期望“传出”界面会使用这样的期货(这对于'传入'界面是好的。)

答案 2 :(得分:1)

首先,显然需要一些围绕Scala Future的纯函数包装器,因为它有效并且尽快运行。我们称之为Deferred

import scala.concurrent.Future
import scala.util.control.Exception.nonFatalCatch

class Deferred[+T](f: () => Future[T]) {
  def run(): Future[T] = f()
}

object Deferred {
  def apply[T](future: => Future[T]): Deferred[T] =
    new Deferred(() => nonFatalCatch.either(future).fold(Future.failed, identity))
}

以下是例程:

import java.util.concurrent.CopyOnWriteArrayList
import java.util.concurrent.atomic.AtomicInteger

import scala.collection.immutable.Seq
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.control.Exception.nonFatalCatch
import scala.util.{Failure, Success}

trait ConcurrencyUtils {    
  def runWithBoundedParallelism[T](parallelism: Int = Runtime.getRuntime.availableProcessors())
                                  (operations: Seq[Deferred[T]])
                                  (implicit ec: ExecutionContext): Deferred[Seq[T]] =
    if (parallelism > 0) Deferred {
      val indexedOps = operations.toIndexedSeq // index for faster access

      val promise = Promise[Seq[T]]()

      val acc = new CopyOnWriteArrayList[(Int, T)] // concurrent acc
      val nextIndex = new AtomicInteger(parallelism) // keep track of the next index atomically

      def run(operation: Deferred[T], index: Int): Unit = {
        operation.run().onComplete {
          case Success(value) =>
            acc.add((index, value)) // accumulate result value

            if (acc.size == indexedOps.size) { // we've done
              import scala.collection.JavaConversions._
              // in concurrent setting next line may be called multiple times, that's why trySuccess instead of success
              promise.trySuccess(acc.view.sortBy(_._1).map(_._2).toList)
            } else {
              val next = nextIndex.getAndIncrement() // get and inc atomically
              if (next < indexedOps.size) { // run next operation if exists
                run(indexedOps(next), next)
              }
            }
          case Failure(t) =>
            promise.tryFailure(t) // same here (may be called multiple times, let's prevent stdout pollution)
        }
      }

      if (operations.nonEmpty) {
        indexedOps.view.take(parallelism).zipWithIndex.foreach((run _).tupled) // run as much as allowed
        promise.future
      } else {
        Future.successful(Seq.empty)
      }
    } else {
      throw new IllegalArgumentException("Parallelism must be positive")
    }
}

简而言之,我们最初在允许的情况下运行尽可能多的操作,然后在每个操作完成时,我们运行下一个可用的操作(如果有的话)。所以这里唯一的困难是在并发设置中维护下一个操作索引和结果累加器。我不是一个绝对的并发专家,所以如果上面的代码中存在一些潜在的问题,请告诉我。请注意,返回的值也是延迟计算,应该是run

用法和测试:

import org.scalatest.{Matchers, FlatSpec}
import org.scalatest.concurrent.ScalaFutures
import org.scalatest.time.{Seconds, Span}

import scala.collection.immutable.Seq
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import scala.concurrent.duration._

class ConcurrencyUtilsSpec extends FlatSpec with Matchers with ScalaFutures with ConcurrencyUtils {

  "runWithBoundedParallelism" should "return results in correct order" in {
    val comp1 = mkDeferredComputation(1)
    val comp2 = mkDeferredComputation(2)
    val comp3 = mkDeferredComputation(3)
    val comp4 = mkDeferredComputation(4)
    val comp5 = mkDeferredComputation(5)

    val compountComp = runWithBoundedParallelism(2)(Seq(comp1, comp2, comp3, comp4, comp5))

    whenReady(compountComp.run()) { result =>
      result should be (Seq(1, 2, 3, 4, 5))
    }
  }

  // increase default ScalaTest patience
  implicit val defaultPatience = PatienceConfig(timeout = Span(10, Seconds))

  private def mkDeferredComputation[T](result: T, sleepDuration: FiniteDuration = 100.millis): Deferred[T] =
    Deferred {
      Future {
        Thread.sleep(sleepDuration.toMillis)
        result
      }
    }

}

答案 3 :(得分:0)

使用Monix任务。来自Monix document并行性的示例= 10

val items = 0 until 1000
// The list of all tasks needed for execution
val tasks = items.map(i => Task(i * 2))
// Building batches of 10 tasks to execute in parallel:
val batches = tasks.sliding(10,10).map(b => Task.gather(b))
// Sequencing batches, then flattening the final result
val aggregate = Task.sequence(batches).map(_.flatten.toList)

// Evaluation:
aggregate.foreach(println)
//=> List(0, 2, 4, 6, 8, 10, 12, 14, 16,...