Question

我正在使用Apache Curator库在Zookeeper上进行领导选举。我将我的应用程序代码部署在各种机器中，我只需要从一台机器执行我的代码，这就是为什么我在动物园管理员上进行领导选举，以便我可以检查我是否是领导者，然后执行此代码

下面是我的LeaderElectionExecutor类，它确保每个应用程序都有一个Curator实例

public class LeaderElectionExecutor {

    private ZookeeperClient zookClient;

    private static final String LEADER_NODE = "/testleader";

    private static class Holder {
        static final LeaderElectionExecutor INSTANCE = new LeaderElectionExecutor();
    }

    public static LeaderElectionExecutor getInstance() {
        return Holder.INSTANCE;
    }

    private LeaderElectionExecutor() {
        try {
            String hostname = Utils.getHostName();

            String nodes = "host1:2181,host2:2181;

            zookClient = new ZookeeperClient(nodes, LEADER_NODE, hostname);
            zookClient.start();

            // added sleep specifically for the leader to get selected
            // since I cannot call isLeader method immediately after starting the latch
            TimeUnit.MINUTES.sleep(1);
        } catch (Exception ex) {
            // logging error
            System.exit(1);
        }
    }

    public ZookeeperClient getZookClient() {
        return zookClient;
    }
}

以下是我的ZookeeperClient代码 -

// can this class be improved in any ways?
public class ZookeeperClient {

    private CuratorFramework client;
    private String latchPath;
    private String id;
    private LeaderLatch leaderLatch;

    public ZookeeperClient(String connString, String latchPath, String id) {
        client = CuratorFrameworkFactory.newClient(connString, new ExponentialBackoffRetry(1000, Integer.MAX_VALUE));
        this.id = id;
        this.latchPath = latchPath;
    }

    public void start() throws Exception {
        client.start();
        leaderLatch = new LeaderLatch(client, latchPath, id);
        leaderLatch.start();
    }

    public boolean isLeader() {
        return leaderLatch.hasLeadership();
    }

    public Participant currentLeader() throws Exception {
        return leaderLatch.getLeader();
    }

    public void close() throws IOException {
        leaderLatch.close();
        client.close();
    }

    public CuratorFramework getClient() {
        return client;
    }

    public String getLatchPath() {
        return latchPath;
    }

    public String getId() {
        return id;
    }

    public LeaderLatch getLeaderLatch() {
        return leaderLatch;
    }
}

现在在我的应用程序中，我正在使用这样的代码 -

public void method01() {
    ZookeeperClient zookClient = LeaderElectionExecutor.getInstance().getZookClient();
    if (zookClient.isLeader()) {
        // do something
    }
}

public void method02() {
    ZookeeperClient zookClient = LeaderElectionExecutor.getInstance().getZookClient();
    if (zookClient.isLeader()) {
        // do something
    }
}

问题陈述： -

在Curator库中 - 在启动锁存器后立即调用isLeader()将不起作用。领导者需要时间才能被选中。而且由于这个原因，我在我的LeaderElectionExecutor代码中添加了1分钟的睡眠，但是我觉得这不是正确的方法。

有没有更好的方法呢？记住这一点，我需要一种方法来检查我是否是领导者然后执行这段代码。我无法在单个方法中完成所有操作，因此我需要从不同的类和方法中调用isLeader方法来检查我是否是领导者然后只执行这段代码。

我使用的是Zookeeper 3.4.5和Curator 1.7.1版本。

Answer 1

一旦我解决了与你非常相似的问题。这就是我做到的。

首先，我让我的对象由Spring管理。所以，我有一个可以通过容器注射的LeaderLatch。使用LeaderLatch的组件之一是LeadershipWatcher，这是Runnable接口的一个实现，它将领导事件分派给其他组件。这些最后的组件是我命名为LeadershipObserver的接口的实现。 LeadershipWatcher的实现大致类似于以下代码：

@Component
public class LeadershipWatcher implements Runnable {
  private final LeaderLatch leaderLatch;
  private final Collection<LeadershipObserver> leadershipObservers;

  /* constructor with @Inject */

  @Override
  public void run() {
    try {
      leaderLatch.await();

      for (LeadershipObserver observer : leadershipObservers) {
        observer.granted();
      }
    } catch (InterruptedException e) {
      for (LeadershipObserver observer : leadershipObservers) {
        observer.interrupted();
      }
    }
  }
}

由于这只是一个草图，我建议您增强此代码，可能应用命令模式来调用观察者，甚至将观察者提交给线程池，如果他们的工作阻塞或长时间运行CPU密集型任务。

Answer 2

leaderLatch = new LeaderLatch(curatorClient, zkPath, String.valueOf(new Random().nextInt()));
leaderLatch.start();
Participant participant;
while(true) {
  participant = leaderLatch.getLeader();
  // Leader election happens asynchronously after calling start, this is a hack to wait until election happens
  if (!(participant.getId().isEmpty() || participant.getId().equalsIgnoreCase(""))) {
    break;
  }
}
if(leaderLatch.hasLeadership()) {
...
}

请注意，getLeader会返回一个ID为“”的虚拟参与者，直到它选出一个领导者。

Answer 3

这是为了重振旧问题......

这类似于srav给出的答案，但我会警告不要使用该代码，因为它使用忙等待并且可能导致在线程中发出的某些回调永远不会被调用，可能永远阻塞。此外，如果存在实际问题，它可以永远重试。

这是我的解决方案，它利用CuratorClient的重试政策，在必要时尝试等待领导选举。

    RetryPolicy retryPolicy = _client.getZookeeperClient().getRetryPolicy();
    RetrySleeper awaitLeadership = _leaderLatch::await;

    final long start = System.currentTimeMillis();
    int count = 0;

    do {
        try {
            // curator will return a dummy leader in the case when a leader has
            // not yet actually been elected. This dummy leader will have isLeader
            // set to false, so we need to check that we got a true leader
            if (_leaderLatch.getLeader().isLeader()) {
                return;
            }
        } catch (KeeperException.NoNodeException e) {
            // this is the case when the leader node has not yet been created
            // by any client - this is fine because we are still waiting for
            // the algorithm to start up so we ignore the error
        }
    } while (retryPolicy.allowRetry(count++, System.currentTimeMillis() - start, awaitLeadership));

    // we have exhausted the retry policy and still have not elected a leader
    throw new IOException("No leader was elected within the specified retry policy!");

虽然看一下你的CuratorFramework初始化，但在指定重试策略时我要小心使用Integer.MAX_VALUE ...

我希望这有帮助！

Answer 4

我之前没有和动物园管理员或策展人合作过，所以我的答案就是我的答案。

设置标志。

Boolean isLeaderSelected = false;

在Latch的开头，将标志设置为false。选择领导者后，将标志设置为true。

在isLeader（）函数中：

isLeader(){
while(!isLeaderSelected){} //waits until leader is selected

//do the rest of the function
}

这也是一个相对hacky的解决方法，但它应该允许isLeader方法尽快执行。如果它们位于不同的类中，则getter应该能够提供isLeaderSelected。

如何使用Curator for Zookeeper高效地使用LeaderElection配方？

4 个答案: