Scala火花:带有番石榴RateLimiter的NotSerializableException

时间:2018-11-06 21:20:34

标签: scala apache-spark serialization

我一直想为spark编写器引入速率限制器,以限制对下游应用程序发出的http请求的数量,并且一直遇到spark序列化错误。

示例代码段:

import org.spark_project.guava.util.concurrent.RateLimiter

@transient
object Baz {
    @transient var maybeRateLimiter: Option[RateLimiter] = createRateLimiter()
    final val DEFAULT_RATELIMITER_ACQUIRE_WAIT_TIME_IN_MS = 1000

    def rateLimitedFetch(someKey: String,
                         fooClient: FooClient)(implicit executionContext: ExecutionContext): EitherT[Future, String, Foo] = {
        maybeRateLimiter.fold {
          logger.info("No rate limiter, not gating requests")
          EitherT(
            fooClient.fetchFoo(someKey)
              .wrapEither(t => s"Error fetching $someKey due to ${t.getMessage}")
          )
        }
        {
          rateLimiter =>
            while (!rateLimiter.tryAcquire(DEFAULT_RATELIMITER_ACQUIRE_WAIT_TIME_IN_MS, TimeUnit.MILLISECONDS)) {
              logger.info(s"Not enough permits, requested: 1, current: {}", rateLimiter.getRate)
            }

            EitherT(
              fooClient.fetchFoo(someKey)
                .wrapEither(t => s"Error fetching $someKey due to ${t.getMessage}")
            )
        }
    }
}

Baz.rateLimitedFetch(someKey, fooClient)

堆栈跟踪:

    Caused by: java.io.NotSerializableException: org.spark_project.guava.util.concurrent.RateLimiter$Bursty
Serialization stack:
    - object not serializable (class: org.spark_project.guava.util.concurrent.RateLimiter$Bursty, value: RateLimiter[stableRate=500.0qps])

不确定在这种情况下是否可以使用番石榴RateLimiter,是否有更好的方法来限制来自spark应用程序的下游请求的速率

0 个答案:

没有答案