摘要
我目前正在尝试使用部署为Docker服务的Keycloak构建身份验证应用程序。我的基础架构如下:
构建集群时,我遇到了缓存问题。构建2个节点的群集时,我没有任何错误,但是当扩展到5个节点时,会出现许多类似这样的警告:
WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-3) JGRP000041: bd3eeb23695b: message d8896fbba960::14 not found in retransmission table
当这些消息开始出现时,容器停止正确响应,最终其中一些停止其Keycloak实例。这种错误在各种情况下都会发生:
症状
当应用崩溃时,我会看到:
1)基于上面显示的日志的许多日志似乎在重复(例如,来自某个节点的消息,这些消息永远都找不到):
2018-08-22 09:59:33,346 WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-2) JGRP000041: bd3eeb23695b: message d8896fbba960::15 not found in retransmission table
2018-08-22 09:59:33,346 WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-2) JGRP000041: bd3eeb23695b: message d8896fbba960::16 not found in retransmission table
2018-08-22 09:59:33,346 WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-2) JGRP000041: bd3eeb23695b: message d8896fbba960::17 not found in retransmission table
2018-08-22 09:59:33,346 WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-2) JGRP000041: bd3eeb23695b: message d8896fbba960::18 not found in retransmission table
...
2018-08-22 09:59:33,040 WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-2) JGRP000041: bd3eeb23695b: message d8896fbba960::15 not found in retransmission table
2018-08-22 09:59:33,040 WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-2) JGRP000041: bd3eeb23695b: message d8896fbba960::16 not found in retransmission table
2018-08-22 09:59:33,040 WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-2) JGRP000041: bd3eeb23695b: message d8896fbba960::17 not found in retransmission table
2018-08-22 09:59:33,040 WARN [org.jboss.as.clustering.jgroups.protocol.NAKACK2] (thread-2) JGRP000041: bd3eeb23695b: message d8896fbba960::18 not found in retransmission table
...
2)发出消息的节点应该显示各种缓存错误:
2018-08-22 09:58:37,130 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (ServerService Thread Pool -- 61) ISPN000136: Error executing command PutKeyValueCommand, writing keys [cluster-start-time]: org.infinispan.util.concurrent.TimeoutException: Replication timeout
2018-08-22 09:58:37,149 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 61) MSC000001: Failed to start service jboss.undertow.deployment.default-server.default-host./odino-stif-keycloak-int/auth: org.jboss.msc.service.StartException in service jboss.undertow.deployment.default-server.default-host./odino-stif-keycloak-int/auth: java.lang.RuntimeException: RESTEASY003325: Failed to construct public org.keycloak.services.resources.KeycloakApplication(javax.servlet.ServletContext,org.jboss.resteasy.core.Dispatcher)
2018-08-22 09:58:37,178 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([("deployment" => "keycloak-server.war")]) - failure description: {"WFLYCTL0080: Failed services" => {"jboss.undertow.deployment.default-server.default-host./odino-stif-keycloak-int/auth" => "java.lang.RuntimeException: RESTEASY003325: Failed to construct public org.keycloak.services.resources.KeycloakApplication(javax.servlet.ServletContext,org.jboss.resteasy.core.Dispatcher)
Caused by: java.lang.RuntimeException: RESTEASY003325: Failed to construct public org.keycloak.services.resources.KeycloakApplication(javax.servlet.ServletContext,org.jboss.resteasy.core.Dispatcher)
Caused by: org.infinispan.util.concurrent.TimeoutException: Replication timeout"}}
2018-08-22 09:58:37,409 WARN [org.infinispan.topology.CacheTopologyControlCommand] (ServerService Thread Pool -- 60) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=actionTokens, type=LEAVE, sender=d8896fbba960, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, actualMembers=null, throwable=null, viewId=3}: java.lang.IllegalArgumentException: A cache topology's pending consistent hash must contain all the current consistent hash's members
然后,该节点通常停止所有缓存和Keycloak。
尝试配置和解决方案
我尝试失败:
我当前使用的配置如下:
<subsystem xmlns="urn:jboss:domain:infinispan:4.0">
<cache-container name="keycloak" jndi-name="infinispan/Keycloak">
<transport lock-timeout="500000"/>
<local-cache name="realms">
<eviction max-entries="10000" strategy="LRU"/>
</local-cache>
<local-cache name="users">
<eviction max-entries="10000" strategy="LRU"/>
</local-cache>
<distributed-cache name="sessions" mode="SYNC" owners="3"/>
<distributed-cache name="authenticationSessions" mode="SYNC" owners="3"/>
<distributed-cache name="offlineSessions" mode="SYNC" owners="1"/>
<distributed-cache name="loginFailures" mode="SYNC" owners="1"/>
<local-cache name="authorization">
<eviction max-entries="10000" strategy="LRU"/>
</local-cache>
<replicated-cache name="work" mode="SYNC"/>
<local-cache name="keys">
<eviction max-entries="1000" strategy="LRU"/>
<expiration max-idle="3600000"/>
</local-cache>
<distributed-cache name="actionTokens" mode="SYNC" owners="2">
<eviction max-entries="-1" strategy="NONE"/>
<expiration max-idle="-1" interval="300000"/>
</distributed-cache>
</cache-container>
...
<cache-container name="ejb" aliases="sfsb" default-cache="dist" module="org.wildfly.clustering.ejb.infinispan">
<transport lock-timeout="300000"/>
<distributed-cache name="dist">
<locking isolation="REPEATABLE_READ"/>
<transaction mode="BATCH"/>
<file-store/>
</distributed-cache>
</cache-container>
</subsystem>
...
<protocol type="pbcast.NAKACK2">
<property name="use_mcast_xmit">false</property>
<property name="xmit_table_num_rows">200</property>
</protocol>
因此,您是否知道为什么会这样?如何更新我的配置以解决此问题?