我一直想为spark编写器引入速率限制器,以限制对下游应用程序发出的http请求的数量,并且一直遇到spark序列化错误。
示例代码段:
import org.spark_project.guava.util.concurrent.RateLimiter
@transient
object Baz {
@transient var maybeRateLimiter: Option[RateLimiter] = createRateLimiter()
final val DEFAULT_RATELIMITER_ACQUIRE_WAIT_TIME_IN_MS = 1000
def rateLimitedFetch(someKey: String,
fooClient: FooClient)(implicit executionContext: ExecutionContext): EitherT[Future, String, Foo] = {
maybeRateLimiter.fold {
logger.info("No rate limiter, not gating requests")
EitherT(
fooClient.fetchFoo(someKey)
.wrapEither(t => s"Error fetching $someKey due to ${t.getMessage}")
)
}
{
rateLimiter =>
while (!rateLimiter.tryAcquire(DEFAULT_RATELIMITER_ACQUIRE_WAIT_TIME_IN_MS, TimeUnit.MILLISECONDS)) {
logger.info(s"Not enough permits, requested: 1, current: {}", rateLimiter.getRate)
}
EitherT(
fooClient.fetchFoo(someKey)
.wrapEither(t => s"Error fetching $someKey due to ${t.getMessage}")
)
}
}
}
Baz.rateLimitedFetch(someKey, fooClient)
堆栈跟踪:
Caused by: java.io.NotSerializableException: org.spark_project.guava.util.concurrent.RateLimiter$Bursty
Serialization stack:
- object not serializable (class: org.spark_project.guava.util.concurrent.RateLimiter$Bursty, value: RateLimiter[stableRate=500.0qps])
不确定在这种情况下是否可以使用番石榴RateLimiter,是否有更好的方法来限制来自spark应用程序的下游请求的速率