复制映射上的Hazelcast唯一ScheduledExecutorService在节点关闭时丢失

时间:2017-03-03 15:12:32

标签: java hazelcast scheduledexecutorservice

我正在尝试运行遍布多个节点的Hazelcast ReplicatedMap进行缓存。该地图将有大约60,000个条目,并且条目的加载/创建非常昂贵。

地图的条目偶尔会变得无效,需要更新或删除,或者必须在地图中插入新条目。为此我想到了定期更新地图的预定服务。

为了防止并发和昂贵的重复创建新条目,应该只有一个重新加载服务。

为了测试这个,我做了一个小测试案例:

public class HazelTest {

  private ReplicatedMap<Long, Bean> hzMap;
  private HazelcastInstance instance;

  public HazelTest() {
    instance = Hazelcast.newHazelcastInstance();

    hzMap = instance.getReplicatedMap("UniqueName");
    IScheduledExecutorService scheduler = instance.getScheduledExecutorService("ExecutorService");
    try {
      scheduler.scheduleAtFixedRate(new BeanReloader(), 5, 10, TimeUnit.SECONDS);
    } catch (DuplicateTaskException ex) {
      System.out.println("reloader already running");
    }
  }

  public static void main(String[] args) throws Exception {
    Random random = new Random(System.currentTimeMillis());

    HazelTest test = new HazelTest();
    System.out.println("Start ...");
    long i = 0;
    try {
      while (true) {
        i++;
        Bean bean = test.hzMap.get((long) random.nextInt(1000));
        if (bean != null) {
          if (i % 100000 == 0) {
            System.out.println("Bean: " + bean.toString());
          }
          bean.setName("NewName");
        }
      }
    } finally {
      test.close();
      System.out.println("End.");
    }
  }

  public void close() {
    instance.getPartitionService().forceLocalMemberToBeSafe(5, TimeUnit.SECONDS);

    if (instance.getPartitionService().isLocalMemberSafe()) {
      instance.shutdown();
    } else {
      System.out.println("Error!!!!!");
    }
  }
}

重新加载器:

public class BeanReloader implements NamedTask, Runnable, HazelcastInstanceAware, Serializable {

  private transient HazelcastInstance hazelcastInstance;

  @Override
  public void run() {
    System.out.println("Bean Reload ....");
    for (long i = 0; i < 200; i++) {
      Bean bean = new Bean(i, "Bean " + i);
      ReplicatedMap<Object, Object> map = hazelcastInstance.getReplicatedMap("UniqueName");
      map.put(i, bean);
    }
    System.out.println("Reload end.");
  }

  @Override
  public void setHazelcastInstance(HazelcastInstance hazelcastInstance) {
    this.hazelcastInstance = hazelcastInstance;
  }

  @Override
  public String getName() {
    return "BeanReloader";
  }

}

bean只有2个字段用于测试目的:

public class Bean implements Serializable {

  private long id;
  private String name;
  // getter and setter
}

现在,当我在我的机器上的不同终端(或通过网络运行)时,节点显示所需的行为:服务一次只在一个节点上运行 - 当我启动另一个节点时,我得到一个DuplicateTaskException。

但是当当前正在执行任务的节点发生故障时,服务并不总是切换到另一个节点。大约有2/3的可能性服务完全丢失,并且没有在剩余的任何节点上运行。

现在我的问题:这种行为是否正常?我是否必须自己检查运行服务,如果是,请通过api查看?

或者我出错了,还有另一种方法来实现我的目标吗?复制的hazelcast首先映射出正确的方法吗?

编辑:编辑代码,因为我为发布所做的简化不再显示错误。

当节点离开且服务未恢复时,日志中没有错误。唯一的记录是在一个节点上:

Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.TcpIpConnection
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connection[id=8, /X.X.X.X:5702->/X.X.X.X:54226, endpoint=[X.X.X.X]:5703, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connecting to /X.X.X.X:5703, timeout: 0, bind-any: true
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Could not connect to: /X.X.X.X:5703. Reason: SocketException[Verbindungsaufbau abgelehnt to address /X.X.X.X:5703]
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connecting to /X.X.X.X:5703, timeout: 0, bind-any: true
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Could not connect to: /X.X.X.X:5703. Reason: SocketException[Verbindungsaufbau abgelehnt to address /X.X.X.X:5703]
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connecting to /X.X.X.X:5703, timeout: 0, bind-any: true
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Could not connect to: /X.X.X.X:5703. Reason: SocketException[Verbindungsaufbau abgelehnt to address /X.X.X.X:5703]
Bean: Bean{id=893, name='NewName'}
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Connecting to /X.X.X.X:5703, timeout: 0, bind-any: true
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.InitConnectionTask
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Could not connect to: /X.X.X.X:5703. Reason: SocketException[Verbindungsaufbau abgelehnt to address /X.X.X.X:5703]
Mär 06, 2017 9:54:08 AM com.hazelcast.nio.tcp.TcpIpConnectionMonitor
WARNUNG: [X.X.X.X]:5702 [dev] [3.8] Removing connection to endpoint [X.X.X.X]:5703 Cause => java.net.SocketException {Verbindungsaufbau abgelehnt to address /X.X.X.X:5703}, Error-Count: 5
Mär 06, 2017 9:54:08 AM com.hazelcast.internal.cluster.ClusterService
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Removing Member [X.X.X.X]:5703 - 875ccc3a-dc10-4c21-815a-4b57ae41a6ff
Mär 06, 2017 9:54:08 AM com.hazelcast.internal.cluster.ClusterService
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] 

Members [3] {
        Member [X.X.X.X]:5702 - 0ec92bb9-6330-4a3d-90ac-0ed374fd266c this
        Member [X.X.X.X]:5701 - 0d1a832e-e1e8-4cad-8546-5a97c8c052c3
        Member [X.X.X.X]:5704 - bd18c7f9-892e-430a-b1ab-da740cc7a6c5
}

Mär 06, 2017 9:54:08 AM com.hazelcast.transaction.TransactionManagerService
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Committing/rolling-back alive transactions of Member [X.X.X.X]:5703 - 875ccc3a-dc10-4c21-815a-4b57ae41a6ff, UUID: 875ccc3a-dc10-4c21-815a-4b57ae41a6ff
Mär 06, 2017 9:54:08 AM com.hazelcast.internal.partition.impl.MigrationManager
INFORMATION: [X.X.X.X]:5702 [dev] [3.8] Re-partitioning cluster data... Migration queue size: 204
除了最后一行之外,

和其他基本相同:

INFORMATION: [X.X.X.X]:5701 [dev] [3.8] Committing/rolling-back alive transactions of Member [X.X.X.X]:5703 - 875ccc3a-dc10-4c21-815a-4b57ae41a6ff, UUID: 875ccc3a-dc10-4c21-815a-4b57ae41a6ff

使用调试的testrun是否记录了4个节点(hz1-4.log)。节点4选择了计划。一次运行后,我杀了那个节点(hz4.log中的日志时间戳15:28:50,955)。以下是该运行的四个日志(注意:由于大量的日志记录,我只粘贴了时间戳15:28:50,955的日志...)

hz3.log

hz2.log

hz1.log

hz4.log

通过该设置,我可以可靠地重现故障。我只用

启动4个节点
mvn exec:java -Dexec.mainClass="de.tle.products.HazelTest" -Dnumber=X

其中X是节点的编号(只是log4j conf文件中的变量)。在所有四个节点之后,正在运行的节点将终止当前正在运行计划的节点,然后计划未在任何其余节点上运行。

0 个答案:

没有答案