Neo4j群集无法启动

时间:2017-05-30 09:45:35

标签: neo4j

我正在构建一个Neo4j HA群集并按照https://neo4j.com/docs/operations-manual/current/tutorial/highly-available-cluster/

中的步骤进行操作

我有3台机器在单实例Docker容器中运行Neo4j。

Neo4j在每台机器上作为独立服务器运行良好,但是当我添加HA配置时,它无法启动。

每台计算机都是使用相同的预建映像创建的,因此它们的配置相同,但ha.server_id除外,它对每台计算机都是唯一的(值为1,2,3)。

防火墙关闭,我已经验证了这些机器之间的连接。

Neo4j HA配置为:

ha.server_id=1
ha.initial_hosts=172.0.30.110:5001,172.0.31.39:5001,172.0.32.249:5001
dbms.mode=HA

要启动Neo4j,我在每台机器上运行以下命令:

/usr/bin/docker run \
--publish=7474:7474 --publish=7687:7687 --publish=5001:5001 \
--volume=/var/lib/neo4j/data:/data \
--volume=/var/lib/neo4j/logs:/logs \
--volume=/var/lib/neo4j/conf:/conf \
--name=neo4j \
neo4j:3.0-enterprise

当我在机器2和3上运行启动命令时,有时在机器1上运行启动命令时,在“尝试加入群集”后出现错误。偶尔(但不是所有时间)当我在机器1上运行命令时,它意识到它是第一台机器并继续初始化为主机。

以下是完整的失败和成功消息。我无法弄清楚为什么只有一台机器(顺便说一下第一台机器)正在做它应该做的事情而不是所有时间。我还需要弄清楚如何让其他两台机器在集​​群中注册。

  1. 为什么一个节点有时会使用群集启动Neo4j,有时不会?
  2. 为什么第二和第三个节点没有注册到群集?
  3. 我应该在哪看?
  4. 错误(在所有3台机器上都可见):

    > 2017-05-30 09:18:35.502+0000 INFO  Starting... 2017-05-30
    > 09:18:35.833+0000 INFO  Write transactions to database disabled
    > 2017-05-30 09:18:36.116+0000 INFO  Bolt enabled on 0.0.0.0:7687.
    > 2017-05-30 09:18:36.131+0000 INFO  Initiating metrics... 2017-05-30
    > 09:18:36.937+0000 INFO  Attempting to join cluster of
    > [172.0.30.110:5001, 172.0.31.39:5001, 172.0.32.249:5001] 2017-05-30
    > 09:19:06.996+0000 ERROR Failed to start Neo4j: Starting Neo4j failed:
    > Component
    > 'org.neo4j.server.database.LifecycleManagingDatabase@626a4cfa' was
    > successfully initialized, but failed to start. Please see attached
    > cause exception. Starting Neo4j failed: Component
    > 'org.neo4j.server.database.LifecycleManagingDatabase@626a4cfa' was
    > successfully initialized, but failed to start. Please see attached
    > cause exception. org.neo4j.server.ServerStartupException: Starting
    > Neo4j failed: Component
    > 'org.neo4j.server.database.LifecycleManagingDatabase@626a4cfa' was
    > successfully initialized, but failed to start. Please see attached
    > cause exception.
    >         at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
    >         at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:215)
    >         at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:90)
    >         at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:67)
    >         at org.neo4j.server.enterprise.EnterpriseEntryPoint.main(EnterpriseEntryPoint.java:32)
    > Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
    > 'org.neo4j.server.database.LifecycleManagingDatabase@626a4cfa' was
    > successfully initialized, but failed to start. Please see attached
    > cause exception.
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
    >         at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
    >         at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
    >         ... 3 more Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.ha.factory.HighlyAvailableFacadeFactory,
    > /var/lib/neo4j/data/databases/fraud.db
    >         at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:144)
    >         at org.neo4j.kernel.ha.factory.HighlyAvailableFacadeFactory.newFacade(HighlyAvailableFacadeFactory.java:42)
    >         at org.neo4j.kernel.ha.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:41)
    >         at org.neo4j.server.enterprise.EnterpriseNeoServer.lambda$static$0(EnterpriseNeoServer.java:80)
    >         at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:89)
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
    >         ... 5 more Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
    > 'org.neo4j.cluster.client.ClusterJoin@5d2e6054' was successfully
    > initialized, but failed to start. Please see attached cause exception.
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
    >         at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
    >         at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
    >         at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:140)
    >         ... 10 more Caused by: java.util.concurrent.TimeoutException: Conversation-response mapping:
    > {3/13#=ResponseFuture{conversationId='3/13#',
    > initiatedByMessageType=join, response=null}}
    >         at org.neo4j.cluster.statemachine.StateMachineProxyFactory$ResponseFuture.get(StateMachineProxyFactory.java:314)
    >         at org.neo4j.cluster.client.ClusterJoin.joinByConfig(ClusterJoin.java:143)
    >         at org.neo4j.cluster.client.ClusterJoin.start(ClusterJoin.java:82)
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
    >         ... 14 more ubuntu@ip-172-0-32-249:~$ sudo /usr/bin/docker rm neo4j neo4j ubuntu@ip-172-0-32-249:~$ sudo /usr/bin/docker run
    > --publish=7474:7474 --publish=7687:7687 --publish=5001:5001 --volume=/var/lib/neo4j/data:/data --volume=/var/lib/neo4j/logs:/logs --volume=/var/lib/neo4j/conf:/conf --name=neo4j neo4j:3.0-enterprise Starting Neo4j. 2017-05-30 09:20:17.877+0000 INFO  No SSL certificate
    > found, generating a self-signed certificate.. 2017-05-30
    > 09:20:18.387+0000 INFO  Starting... 2017-05-30 09:20:18.676+0000 INFO 
    > Write transactions to database disabled 2017-05-30 09:20:19.036+0000
    > INFO  Bolt enabled on 0.0.0.0:7687. 2017-05-30 09:20:19.046+0000 INFO 
    > Initiating metrics... 2017-05-30 09:20:19.779+0000 INFO  Attempting to
    > join cluster of [172.0.30.110:5001, 172.0.31.39:5001,
    > 172.0.32.249:5001] 2017-05-30 09:20:49.851+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component
    > 'org.neo4j.server.database.LifecycleManagingDatabase@1b826a22' was
    > successfully initialized, but failed to start. Please see attached
    > cause exception. Starting Neo4j failed: Component
    > 'org.neo4j.server.database.LifecycleManagingDatabase@1b826a22' was
    > successfully initialized, but failed to start. Please see attached
    > cause exception. org.neo4j.server.ServerStartupException: Starting
    > Neo4j failed: Component
    > 'org.neo4j.server.database.LifecycleManagingDatabase@1b826a22' was
    > successfully initialized, but failed to start. Please see attached
    > cause exception.
    >         at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
    >         at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:215)
    >         at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:90)
    >         at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:67)
    >         at org.neo4j.server.enterprise.EnterpriseEntryPoint.main(EnterpriseEntryPoint.java:32)
    > Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
    > 'org.neo4j.server.database.LifecycleManagingDatabase@1b826a22' was
    > successfully initialized, but failed to start. Please see attached
    > cause exception.
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
    >         at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
    >         at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
    >         ... 3 more Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.ha.factory.HighlyAvailableFacadeFactory,
    > /var/lib/neo4j/data/databases/fraud.db
    >         at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:144)
    >         at org.neo4j.kernel.ha.factory.HighlyAvailableFacadeFactory.newFacade(HighlyAvailableFacadeFactory.java:42)
    >         at org.neo4j.kernel.ha.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:41)
    >         at org.neo4j.server.enterprise.EnterpriseNeoServer.lambda$static$0(EnterpriseNeoServer.java:80)
    >         at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:89)
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
    >         ... 5 more Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
    > 'org.neo4j.cluster.client.ClusterJoin@6c1c89e2' was successfully
    > initialized, but failed to start. Please see attached cause exception.
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
    >         at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
    >         at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
    >         at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:140)
    >         ... 10 more Caused by: java.util.concurrent.TimeoutException: Conversation-response mapping:
    > {3/13#=ResponseFuture{conversationId='3/13#',
    > initiatedByMessageType=join, response=null}}
    >         at org.neo4j.cluster.statemachine.StateMachineProxyFactory$ResponseFuture.get(StateMachineProxyFactory.java:314)
    >         at org.neo4j.cluster.client.ClusterJoin.joinByConfig(ClusterJoin.java:143)
    >         at org.neo4j.cluster.client.ClusterJoin.start(ClusterJoin.java:82)
    >         at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
    >         ... 14 more
    

    成功(仅在一台机器上看到):

    2017-05-30 09:20:22.963+0000 INFO  Starting...
    2017-05-30 09:20:23.319+0000 INFO  Write transactions to database disabled
    2017-05-30 09:20:23.619+0000 INFO  Bolt enabled on 0.0.0.0:7687.
    2017-05-30 09:20:23.628+0000 INFO  Initiating metrics...
    2017-05-30 09:20:24.352+0000 INFO  Attempting to join cluster of [172.0.30.110:5001, 172.0.31.39:5001, 172.0.32.249:5001]
    2017-05-30 09:20:31.382+0000 INFO  Could not join cluster of [172.0.30.110:5001,172.0.31.39:5001, 172.0.32.249:5001]
    2017-05-30 09:20:31.382+0000 INFO  Creating new cluster with name [neo4j.ha]...
    2017-05-30 09:20:31.396+0000 INFO  Instance 1 (this server)  entered the cluster
    2017-05-30 09:20:31.401+0000 INFO  Instance 1 (this server)  was elected as coordinator
    2017-05-30 09:20:31.413+0000 INFO  I am 1, moving to master
    2017-05-30 09:20:31.465+0000 INFO  Instance 1 (this server)  was elected as coordinator
    2017-05-30 09:20:31.492+0000 INFO  I am 1, successfully moved to master
    2017-05-30 09:20:31.510+0000 INFO  Instance 1 (this server)  is available as master at ha://172.17.0.2:6001?serverId=1 with StoreId{creationTime=1496122128110,randomId=-2708986986476371425, storeVersion=15531981201765894, upgradeTime=1496122128110, upgradeId=1}
    2017-05-30 09:20:31.612+0000 INFO  Instance 1 (this server)  is available as backup at backup://127.0.0.1:6362 with StoreId{creationTime=1496122128110, randomId=-2708986986476371425, storeVersion=15531981201765894, upgradeTime=1496122128110, upgradeId=1}
    2017-05-30 09:20:31.709+0000 INFO  Database available for write transactions
    2017-05-30 09:20:32.929+0000 INFO  Started.
    2017-05-30 09:20:33.087+0000 INFO  Mounted REST API at: /db/manage
    2017-05-30 09:20:33.828+0000 INFO  Remote interface available at http://0.0.0.0:7474/
    

0 个答案:

没有答案