我们有一些spring-boot应用程序,它们使用spring-boot-starter-data-cassandra-active连接到Cassandra。在应用程序启动并在生产环境中运行6到7个小时后,我们开始注意到OperationTimedOutException。我们有5个节点集群。问题显然出在客户端datastax驱动程序和底层Netty连接上。显然,Cassandra日志中没有任何活动,或者Cassandra中没有网络活动。查询本身不会失败,只是在默认的12秒后重试即可,并且工作正常。该问题在不同的查询中非常随机地发生。 datastax驱动程序为3.7.2,io.netty版本为4.1.45.FINAL。不确定客户端在清楚地显示主机详细信息以及查询详细信息(日志)的情况下可能导致这种行为的原因,但是在Cassandra端实际上没有发现任何跟踪或连接。任何帮助表示高度赞赏。
由以下原因引起:org.springframework.data.cassandra.CassandraConnectionFailureException:查询; CQL [更新表SET xx =?其中xx =?和xx =?和xx =?和zz =?]; [主机]等待服务器响应超时;嵌套异常是com.datastax.driver.core.exceptions.OperationTimedOutException:[主机]等待服务器响应时超时 在org.springframework.data.cassandra.core.cql.CassandraExceptionTranslator.translate(CassandraExceptionTranslator.java:170)〜[spring-data-cassandra-2.2.5.RELEASE.jar!/:2.2.5.RELEASE] 在org.springframework.data.cassandra.core.cql.ReactiveCassandraAccessor.translate(ReactiveCassandraAccessor.java:149)〜[spring-data-cassandra-2.2.5.RELEASE.jar!/:2.2.5.RELEASE] 在org.springframework.data.cassandra.core.cql.ReactiveCqlTemplate.lambda $ translateException $ 18(ReactiveCqlTemplate.java:752)〜[spring-data-cassandra-2.2.5.RELEASE.jar!/:2.2.5.RELEASE] 在react.core.publisher.Flux.lambda $ onErrorMap $ 28(Flux.java:6442)〜[reactor-core-3.3.3.RELEASE.jar!/:3.3.3.RELEASE] 在Reactor.core.publisher.FluxOnErrorResume $ ResumeSubscriber.onError(FluxOnErrorResume.java:88)〜[reactor-core-3.3.3.RELEASE.jar!/:3.3.3.RELEASE] 在org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onError(ScopePassingSpanSubscriber.java:97)〜[spring-cloud-sleuth-core-2.2.2.RELEASE.jar!/:2.2.2.RELEASE] 在org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onError(ScopePassingSpanSubscriber.java:97)〜[spring-cloud-sleuth-core-2.2.2.RELEASE.jar!/:2.2.2.RELEASE] 在Reactor.core.publisher.MonoFlatMapMany $ FlatMapManyMain.onError(MonoFlatMapMany.java:197)〜[reactor-core-3.3.3.RELEASE.jar!/:3.3.3.RELEASE] 在org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onError(ScopePassingSpanSubscriber.java:97)〜[spring-cloud-sleuth-core-2.2.2.RELEASE.jar!/:2.2.2.RELEASE] 在react..core.publisher.MonoCreate $ DefaultMonoSink.error(MonoCreate.java:183)〜[reactor-core-3.3.3.RELEASE.jar!/:3.3.3.RELEASE] 在org.springframework.data.cassandra.core.cql.session.DefaultBridgedReactiveSession.lambda $ adaptFuture $ 2(DefaultBridgedReactiveSession.java:237)〜[spring-data-cassandra-2.2.5.RELEASE.jar!/:2.2.5。发布] 在com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1165)[guava-28.2-android.jar!/:na] com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)[guava-28.2-android.jar!/:na] 在com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:749)[guava-28.2-android.jar!/:na] 在com.google.common.util.concurrent.AbstractTransformFuture.run(AbstractTransformFuture.java:100)[guava-28.2-android.jar!/:na] 在com.google.common.util.concurrent.MoreExecutors $ 5 $ 1.run(MoreExecutors.java:986)[guava-28.2-android.jar!/:na] 在io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)[netty-common-4.1.45.Final.jar!/:4.1.45.Final] 在com.google.common.util.concurrent.MoreExecutors $ 5.execute(MoreExecutors.java:981)[guava-28.2-android.jar!/:na] 在com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1165)[guava-28.2-android.jar!/:na] com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)[guava-28.2-android.jar!/:na] com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:749)[guava-28.2-android.jar!/:na] com.datastax.driver.core.DefaultResultSetFuture.onException(DefaultResultSetFuture.java:248)[cassandra-driver-core-3.7.2.jar!/:na] 在com.datastax.driver.core.RequestHandler.setFinalException(RequestHandler.java:271)[cassandra-driver-core-3.7.2.jar!/:na] at com.datastax.driver.core.RequestHandler.access $ 2500(RequestHandler.java:62)[cassandra-driver-core-3.7.2.jar!/:na] 在com.datastax.driver.core.RequestHandler $ SpeculativeExecution.setFinalException(RequestHandler.java:1001)[cassandra-driver-core-3.7.2.jar!/:na] 在com.datastax.driver.core.RequestHandler $ SpeculativeExecution.processRetryDecision(RequestHandler.java:543)[cassandra-driver-core-3.7.2.jar!/:na] 在com.datastax.driver.core.RequestHandler $ SpeculativeExecution.onTimeout(RequestHandler.java:981)[cassandra-driver-core-3.7.2.jar!/:na] 在com.datastax.driver.core.Connection $ ResponseHandler $ 1.run(Connection.java:1582)[cassandra-driver-core-3.7.2.jar!/:na] 在io.netty.util.HashedWheelTimer $ HashedWheelTimeout.expire(HashedWheelTimer.java:672)上[netty-common-4.1.45.Final.jar!/:4.1.45.Final] 在io.netty.util.HashedWheelTimer $ HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)上[netty-common-4.1.45.Final.jar!/:4.1.45.Final] 在io.netty.util.HashedWheelTimer $ Worker.run(HashedWheelTimer.java:472)[netty-common-4.1.45.Final.jar!/:4.1.45.Final] 在io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)[netty-common-4.1.45.Final.jar!/:4.1.45.Final] 在java.lang.Thread.run(Thread.java:748)[na:1.8.0_212] 引起原因:com.datastax.driver.core.exceptions.OperationTimedOutException:[主机]等待服务器响应时超时 在com.datastax.driver.core.RequestHandler $ SpeculativeExecution.onTimeout(RequestHandler.java:973)[cassandra-driver-core-3.7.2.jar!/:na] ...省略了6个共同的框架