首先,我不是Akka新手(已经使用了2年多)。我有一个高吞吐量(数百万msg / min /节点)应用程序,可以执行繁重的网络I / O.初始actor(由RandomRouter
支持)接收消息并将它们分发给适当的子actor以进行处理:
private val distributionRouter = system.actorOf(Props(new DistributionActor)
.withDispatcher("distributor-dispatcher")
.withRouter(RandomRouter(distrib.distributionActors)), "distributionActor")
该应用程序经过高度调整,性能卓越。我想通过在DistributionActor
前面使用持久邮箱来使其更具容错能力。这是相关配置(仅更改是添加基于文件的邮箱):
akka.actor.mailbox.file-based {
directory-path = "./.akka_mb"
max-items = 2147483647
# attempting to add an item after the queue reaches this size (in bytes) will fail.
max-size = 2147483647 bytes
# attempting to add an item larger than this size (in bytes) will fail.
max-item-size = 2147483647 bytes
# maximum expiration time for this queue (seconds).
max-age = 3600s
# maximum journal size before the journal should be rotated.
max-journal-size = 16 MiB
# maximum size of a queue before it drops into read-behind mode.
max-memory-size = 128 MiB
# maximum overflow (multiplier) of a journal file before we re-create it.
max-journal-overflow = 10
# absolute maximum size of a journal file until we rebuild it, no matter what.
max-journal-size-absolute = 9223372036854775807 bytes
# whether to drop older items (instead of newer) when the queue is full
discard-old-when-full = on
# whether to keep a journal file at all
keep-journal = on
# whether to sync the journal after each transaction
sync-journal = off
# circuit breaker configuration
circuit-breaker {
# maximum number of failures before opening breaker
max-failures = 3
# duration of time beyond which a call is assumed to be timed out and considered a failure
call-timeout = 3 seconds
# duration of time to wait until attempting to reset the breaker during which all calls fail-fast
reset-timeout = 30 seconds
}
}
distributor-dispatcher {
executor = "thread-pool-executor"
type = Dispatcher
thread-pool-executor {
core-pool-size-min = 20
core-pool-size-max = 20
max-pool-size-min = 20
}
throughput = 100
mailbox-type = akka.actor.mailbox.FileBasedMailboxType
}
一旦我介绍了这个,我就注意到了很多丢失的消息。当我通过Typesafe控制台对其进行分析时,我看到一堆死信(比如每1M的100k左右)。我的邮箱文件每个演员只有12MB,所以它们甚至都没有接近极限。我还设置了一个死信监听器来计算死信,这样我就可以在探查器外运行它(也许是一个仪器问题,我想?)。相同的结果。
知道可能导致死信的原因吗?
我在Scala 2.9.2上使用Akka 2.0.4。
更新
我注意到死信似乎已被DistributionActor
所拥有的几个儿童演员绑定。我不明白为什么更改父母的邮箱对此有任何影响,但这绝对是行为。