可以同时运行多少个Akka Streams是否有限制?

时间:2019-08-05 16:48:55

标签: scala akka scalability akka-stream

我正在尝试使用BroadcastHub实现一个简单的一对多发布/订阅模式。对于大量的订阅者来说,这无声地失败了,这让我觉得我在可以运行的流数量上达到了一定的极限。

首先,让我们定义一些事件:

sealed trait Event
case object EX extends Event
case object E1 extends Event
case object E2 extends Event
case object E3 extends Event
case object E4 extends Event
case object E5 extends Event

我已使用BroadcastHub实现发布者,每次想添加新订阅者时都添加一个Sink.actorRefWithAck。发布EX事件将结束广播:

trait Publisher extends Actor with ActorLogging {
  implicit val materializer = ActorMaterializer()

  private val sourceQueue = Source.queue[Event](Publisher.bufferSize, Publisher.overflowStrategy)
  private val (
    queue: SourceQueueWithComplete[Event],
    source: Source[Event, NotUsed]
  ) = {
    val (q,s) = sourceQueue.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run()
    s.runWith(Sink.ignore)
    (q,s)
  }

  def publish(evt: Event) = {
    log.debug("Publishing Event: {}", evt.getClass().toString())
    queue.offer(evt)
    evt match {
      case EX => queue.complete()
      case _ => Unit
    }
 }

  def subscribe(actor: ActorRef, ack: ActorRef): Unit =
    source.runWith(
      Sink.actorRefWithAck(
        actor,
        onInitMessage = Publisher.StreamInit(ack),
        ackMessage = Publisher.StreamAck,
        onCompleteMessage = Publisher.StreamDone,
        onFailureMessage = onErrorMessage))

  def onErrorMessage(ex: Throwable) = Publisher.StreamFail(ex)

  def publisherBehaviour: Receive = {
    case Publisher.Subscribe(sub, ack) => subscribe(sub, ack.getOrElse(sender()))
    case Publisher.StreamAck => Unit
  }

  override def receive = LoggingReceive { publisherBehaviour }
}

object Publisher {
  final val bufferSize = 5
  final val overflowStrategy = OverflowStrategy.backpressure

  case class Subscribe(sub: ActorRef, ack: Option[ActorRef])

  case object StreamAck
  case class StreamInit(ack: ActorRef)
  case object StreamDone
  case class StreamFail(ex: Throwable)
}

用户可以实现Subscriber特征来分隔逻辑:

trait Subscriber {
  def onInit(publisher: ActorRef): Unit = ()
  def onInit(publisher: ActorRef, k: KillSwitch): Unit = onInit(publisher)
  def onEvent(event: Event): Unit = ()
  def onDone(publisher: ActorRef, subscriber: ActorRef): Unit = ()
  def onFail(e: Throwable, publisher: ActorRef, subscriber: ActorRef): Unit = ()
}

演员逻辑很简单:

class SubscriberActor(subscriber: Subscriber) extends Actor with ActorLogging {

  def subscriberBehaviour: Receive = {
    case Publisher.StreamInit(ack) => {
      log.debug("Stream initialized.")
      subscriber.onInit(sender())
      sender() ! Publisher.StreamAck
      ack.forward(Publisher.StreamInit(ack))
    }
    case Publisher.StreamDone => {
      log.debug("Stream completed.")
      subscriber.onDone(sender(),self)
    }
    case Publisher.StreamFail(ex) => {
      log.error(ex, "Stream failed!")
      subscriber.onFail(ex,sender(),self)
    }
    case e: Event => {
      log.debug("Observing Event: {}",e)
      subscriber.onEvent(e)
      sender() ! Publisher.StreamAck
    }
  }

  override def receive = LoggingReceive { subscriberBehaviour }
}

关键点之一是所有订阅者必须接收发布者发送的所有消息,因此我们必须知道所有流已经实现,并且所有参与者都可以在开始广播之前准备好接收。这就是将StreamInit消息转发给另一个用户提供的参与者的原因。

为了测试这一点,我定义了一个简单的MockPublisher,它会在被告知时广播事件列表:

class MockPublisher(events: Event*) extends Publisher {
  def receiveBehaviour: Receive = {
    case MockPublish => events map publish
  }
  override def receive = LoggingReceive { receiveBehaviour orElse publisherBehaviour }
}
case object MockPublish

我还定义了一个MockSubscriber,他只计算它已经看到多少事件:

class MockSubscriber extends Subscriber {
  var count = 0
  val promise = Promise[Int]()
  def future = promise.future

  override def onInit(publisher: ActorRef): Unit = count = 0
  override def onEvent(event: Event): Unit = count += 1
  override def onDone(publisher: ActorRef, subscriber: ActorRef): Unit = promise.success(count)
  override def onFail(e: Throwable, publisher: ActorRef, subscriber: ActorRef): Unit = promise.failure(e) 
}

还有一种订阅的小方法:

object MockSubscriber {
  def sub(publisher: ActorRef, ack: ActorRef)(implicit system: ActorSystem): Future[Int] = {
    val s = new MockSubscriber()
    implicit val tOut = Timeout(1.minute)
    val a = system.actorOf(Props(new SubscriberActor(s)))

    val f = publisher ! Publisher.Subscribe(a, Some(ack))

    s.future
  }
}

我将所有内容放在一起进行单元测试:

class SubscriberTests extends TestKit(ActorSystem("SubscriberTests")) with
    WordSpecLike with Matchers with BeforeAndAfterAll with ImplicitSender {

  override def beforeAll:Unit = {
    system.eventStream.setLogLevel(Logging.DebugLevel)
  }
  override def afterAll:Unit = {
    println("Shutting down...")
    TestKit.shutdownActorSystem(system)
  }

  "The Subscriber" must {
    "publish events to many observers" in {
      val n = 9

      val p = system.actorOf(Props(new MockPublisher(E1,E2,E3,E4,E5,EX)))

      val q = scala.collection.mutable.Queue[Future[Int]]()

      for (i <- 1 to n) {
        q += MockSubscriber.sub(p,self)
      }

      for (i <- 1 to n) {
        expectMsgType[Publisher.StreamInit](70.seconds)
      }
      p ! MockPublish

      q.map { f => Await.result(f, 10.seconds) should be (6) }
    }
  }
}

对于相对较小的n,此测试成功,但对于val n = 90000,则失败。在任何地方都不会出现捕获或未捕获的异常,Java也不会发出内存不足的投诉(如果我更高的话会发生)。

我想念什么?

编辑:在具有不同规格的多台计算机上进行了尝试。 n足够高时,调试信息显示没有消息到达任何订阅者。

1 个答案:

答案 0 :(得分:1)

Akka流(实际上还有任何其他反应流)为您提供背压。如果您没有弄乱创建消费者的方式(例如,允许创建1GB JSON,则仅在将其提取到内存中后才将其切成小块),您应该处于一个舒适的状态,可以认为您的内存使用情况很不错(因为背压如何管理推挽力学)。一旦测量了上限所在的位置,就可以设置JVM和容器内存,这样就可以让它运行而不必担心内存不足错误(前提是JVM中没有发生其他可能导致内存不足的事情)使用率峰值)。

因此,由此可见,并行运行多少流存在一些限制-特别是您只能在内存允许的情况下运行其中的尽可能多的流。 CPU不应受到限制(因为您将有多个线程),但是如果您在一台计算机上启动太多线程,则CPU不可避免地不得不在不同的流之间进行切换,从而使每个线程都变慢。它可能不是技术上的障碍,但是您可能最终会遇到这样的情况:处理速度太慢,以至于无法满足其业务目的(尽管我想您一次必须运行的流要多于少数几个)

在测试中,您可能还会遇到其他一些问题。例如。如果您对Actor System使用相同的线程池进行某些阻塞操作而未通知线程池它们正在阻塞,则可能会导致死锁(事实上,应该运行所有IO阻塞操作)与“计算”操作不在同一线程池上)。同时发生90000(!)个并发事件(并且可能具有相同的小线程池)几乎可以保证遇到问题(我猜您可能会遇到问题,即使您不是参与者也可以直接在期货上运行代码)。在这里,您正在测试中使用actor系统,AFAIR使用阻塞逻辑仅突出显示小线程池的所有可能问题,这些线程池将阻塞和非阻塞任务保持在同一位置。