我正在尝试运行遍布多个节点的Hazelcast ReplicatedMap进行缓存。该地图将有大约60,000个条目,并且条目的加载/创建非常昂贵。
地图的条目偶尔会变得无效,需要更新或删除,或者必须在地图中插入新条目。为此我想到了定期更新地图的预定服务。
为了防止并发和昂贵的重复创建新条目,应该只有一个重新加载服务。
为了测试这个,我做了一个小测试案例:
public class HazelTest {
private ReplicatedMap<Long, Bean> hzMap;
private HazelcastInstance instance;
public HazelTest() {
instance = Hazelcast.newHazelcastInstance();
hzMap = instance.getReplicatedMap("UniqueName");
IScheduledExecutorService scheduler = instance.getScheduledExecutorService("ExecutorService");
try {
scheduler.scheduleAtFixedRate(new BeanReloader(), 5, 10, TimeUnit.SECONDS);
} catch (DuplicateTaskException ex) {
System.out.println("reloader already running");
}
}
public static void main(String[] args) throws Exception {
Random random = new Random(System.currentTimeMillis());
HazelTest test = new HazelTest();
System.out.println("Start ...");
long i = 0;
try {
while (true) {
i++;
Bean bean = test.hzMap.get((long) random.nextInt(1000));
if (bean != null) {
if (i % 100000 == 0) {
System.out.println("Bean: " + bean.toString());
}
bean.setName("NewName");
}
}
} finally {
test.close();
System.out.println("End.");
}
}
public void close() {
instance.getPartitionService().forceLocalMemberToBeSafe(5, TimeUnit.SECONDS);
if (instance.getPartitionService().isLocalMemberSafe()) {
instance.shutdown();
} else {
System.out.println("Error!!!!!");
}
}
}
重新加载器:
public class BeanReloader implements NamedTask, Runnable, HazelcastInstanceAware, Serializable {
private transient HazelcastInstance hazelcastInstance;
@Override
public void run() {
System.out.println("Bean Reload ....");
for (long i = 0; i < 200; i++) {
Bean bean = new Bean(i, "Bean " + i);
ReplicatedMap<Object, Object> map = hazelcastInstance.getReplicatedMap("UniqueName");
map.put(i, bean);
}
System.out.println("Reload end.");
}
@Override
public void setHazelcastInstance(HazelcastInstance hazelcastInstance) {
this.hazelcastInstance = hazelcastInstance;
}
@Override
public String getName() {
return "BeanReloader";
}
}
bean只有2个字段用于测试目的:
public class Bean implements Serializable {
private long id;
private String name;
// getter and setter
}
现在,当我在我的机器上的不同终端(或通过网络运行)时,节点显示所需的行为:服务一次只在一个节点上运行 - 当我启动另一个节点时,我得到一个DuplicateTaskException。
但是当当前正在执行任务的节点发生故障时,服务并不总是切换到另一个节点。大约有2/3的可能性服务完全丢失,并且没有在剩余的任何节点上运行。
现在我的问题:这种行为是否正常?我是否必须自己检查运行服务,如果是,请通过api查看?
或者我出错了,还有另一种方法来实现我的目标吗?复制的hazelcast首先映射出正确的方法吗?
编辑:编辑代码,因为我为发布所做的简化不再显示错误。
当节点离开且服务未恢复时,日志中没有错误。唯一的记录是在一个节点上:
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.TcpIpConnection
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connection[id=8, /X.X.X.X:5702->/X.X.X.X:54226, endpoint=[X.X.X.X]:5703, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connecting to /X.X.X.X:5703, timeout: 0, bind-any: true
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Could not connect to: /X.X.X.X:5703. Reason: SocketException[Verbindungsaufbau abgelehnt to address /X.X.X.X:5703]
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connecting to /X.X.X.X:5703, timeout: 0, bind-any: true
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Could not connect to: /X.X.X.X:5703. Reason: SocketException[Verbindungsaufbau abgelehnt to address /X.X.X.X:5703]
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connecting to /X.X.X.X:5703, timeout: 0, bind-any: true
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Could not connect to: /X.X.X.X:5703. Reason: SocketException[Verbindungsaufbau abgelehnt to address /X.X.X.X:5703]
Bean: Bean{id=893, name='NewName'}
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connecting to /X.X.X.X:5703, timeout: 0, bind-any: true
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Could not connect to: /X.X.X.X:5703. Reason: SocketException[Verbindungsaufbau abgelehnt to address /X.X.X.X:5703]
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.TcpIpConnectionMonitor
WARNUNG: [X.X.X.X]:5702 [dev] [3.8] Removing connection to endpoint [X.X.X.X]:5703 Cause => java.net.SocketException {Verbindungsaufbau abgelehnt to address /X.X.X.X:5703}, Error-Count: 5
Mär 06, 2017 9:54:08 AM com.hazelcast.internal.cluster.ClusterService
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Removing Member [X.X.X.X]:5703 - 875ccc3a-dc10-4c21-815a-4b57ae41a6ff
Mär 06, 2017 9:54:08 AM com.hazelcast.internal.cluster.ClusterService
INFORMATION: [X.X.X.X]:5702 [dev] [3.8]
Members [3] {
Member [X.X.X.X]:5702 - 0ec92bb9-6330-4a3d-90ac-0ed374fd266c this
Member [X.X.X.X]:5701 - 0d1a832e-e1e8-4cad-8546-5a97c8c052c3
Member [X.X.X.X]:5704 - bd18c7f9-892e-430a-b1ab-da740cc7a6c5
}
Mär 06, 2017 9:54:08 AM com.hazelcast.transaction.TransactionManagerService
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Committing/rolling-back alive transactions of Member [X.X.X.X]:5703 - 875ccc3a-dc10-4c21-815a-4b57ae41a6ff, UUID: 875ccc3a-dc10-4c21-815a-4b57ae41a6ff
Mär 06, 2017 9:54:08 AM com.hazelcast.internal.partition.impl.MigrationManager
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Re-partitioning cluster data... Migration queue size: 204
除了最后一行之外,和其他基本相同:
INFORMATION: [X.X.X.X]:5701 [dev] [3.8] Committing/rolling-back alive transactions of Member [X.X.X.X]:5703 - 875ccc3a-dc10-4c21-815a-4b57ae41a6ff, UUID: 875ccc3a-dc10-4c21-815a-4b57ae41a6ff
使用调试的testrun是否记录了4个节点(hz1-4.log)。节点4选择了计划。一次运行后,我杀了那个节点(hz4.log中的日志时间戳15:28:50,955)。以下是该运行的四个日志(注意:由于大量的日志记录,我只粘贴了时间戳15:28:50,955的日志...)
通过该设置,我可以可靠地重现故障。我只用
启动4个节点mvn exec:java -Dexec.mainClass="de.tle.products.HazelTest" -Dnumber=X
其中X是节点的编号(只是log4j conf文件中的变量)。在所有四个节点之后,正在运行的节点将终止当前正在运行计划的节点,然后计划未在任何其余节点上运行。