我正在构建一个Neo4j HA群集并按照https://neo4j.com/docs/operations-manual/current/tutorial/highly-available-cluster/
中的步骤进行操作我有3台机器在单实例Docker容器中运行Neo4j。
Neo4j在每台机器上作为独立服务器运行良好,但是当我添加HA配置时,它无法启动。
每台计算机都是使用相同的预建映像创建的,因此它们的配置相同,但ha.server_id除外,它对每台计算机都是唯一的(值为1,2,3)。
防火墙关闭,我已经验证了这些机器之间的连接。
Neo4j HA配置为:
ha.server_id=1
ha.initial_hosts=172.0.30.110:5001,172.0.31.39:5001,172.0.32.249:5001
dbms.mode=HA
要启动Neo4j,我在每台机器上运行以下命令:
/usr/bin/docker run \
--publish=7474:7474 --publish=7687:7687 --publish=5001:5001 \
--volume=/var/lib/neo4j/data:/data \
--volume=/var/lib/neo4j/logs:/logs \
--volume=/var/lib/neo4j/conf:/conf \
--name=neo4j \
neo4j:3.0-enterprise
当我在机器2和3上运行启动命令时,有时在机器1上运行启动命令时,在“尝试加入群集”后出现错误。偶尔(但不是所有时间)当我在机器1上运行命令时,它意识到它是第一台机器并继续初始化为主机。
以下是完整的失败和成功消息。我无法弄清楚为什么只有一台机器(顺便说一下第一台机器)正在做它应该做的事情而不是所有时间。我还需要弄清楚如何让其他两台机器在集群中注册。
错误(在所有3台机器上都可见):
> 2017-05-30 09:18:35.502+0000 INFO Starting... 2017-05-30
> 09:18:35.833+0000 INFO Write transactions to database disabled
> 2017-05-30 09:18:36.116+0000 INFO Bolt enabled on 0.0.0.0:7687.
> 2017-05-30 09:18:36.131+0000 INFO Initiating metrics... 2017-05-30
> 09:18:36.937+0000 INFO Attempting to join cluster of
> [172.0.30.110:5001, 172.0.31.39:5001, 172.0.32.249:5001] 2017-05-30
> 09:19:06.996+0000 ERROR Failed to start Neo4j: Starting Neo4j failed:
> Component
> 'org.neo4j.server.database.LifecycleManagingDatabase@626a4cfa' was
> successfully initialized, but failed to start. Please see attached
> cause exception. Starting Neo4j failed: Component
> 'org.neo4j.server.database.LifecycleManagingDatabase@626a4cfa' was
> successfully initialized, but failed to start. Please see attached
> cause exception. org.neo4j.server.ServerStartupException: Starting
> Neo4j failed: Component
> 'org.neo4j.server.database.LifecycleManagingDatabase@626a4cfa' was
> successfully initialized, but failed to start. Please see attached
> cause exception.
> at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
> at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:215)
> at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:90)
> at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:67)
> at org.neo4j.server.enterprise.EnterpriseEntryPoint.main(EnterpriseEntryPoint.java:32)
> Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
> 'org.neo4j.server.database.LifecycleManagingDatabase@626a4cfa' was
> successfully initialized, but failed to start. Please see attached
> cause exception.
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
> at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
> ... 3 more Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.ha.factory.HighlyAvailableFacadeFactory,
> /var/lib/neo4j/data/databases/fraud.db
> at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:144)
> at org.neo4j.kernel.ha.factory.HighlyAvailableFacadeFactory.newFacade(HighlyAvailableFacadeFactory.java:42)
> at org.neo4j.kernel.ha.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:41)
> at org.neo4j.server.enterprise.EnterpriseNeoServer.lambda$static$0(EnterpriseNeoServer.java:80)
> at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:89)
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
> ... 5 more Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
> 'org.neo4j.cluster.client.ClusterJoin@5d2e6054' was successfully
> initialized, but failed to start. Please see attached cause exception.
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
> at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:140)
> ... 10 more Caused by: java.util.concurrent.TimeoutException: Conversation-response mapping:
> {3/13#=ResponseFuture{conversationId='3/13#',
> initiatedByMessageType=join, response=null}}
> at org.neo4j.cluster.statemachine.StateMachineProxyFactory$ResponseFuture.get(StateMachineProxyFactory.java:314)
> at org.neo4j.cluster.client.ClusterJoin.joinByConfig(ClusterJoin.java:143)
> at org.neo4j.cluster.client.ClusterJoin.start(ClusterJoin.java:82)
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
> ... 14 more ubuntu@ip-172-0-32-249:~$ sudo /usr/bin/docker rm neo4j neo4j ubuntu@ip-172-0-32-249:~$ sudo /usr/bin/docker run
> --publish=7474:7474 --publish=7687:7687 --publish=5001:5001 --volume=/var/lib/neo4j/data:/data --volume=/var/lib/neo4j/logs:/logs --volume=/var/lib/neo4j/conf:/conf --name=neo4j neo4j:3.0-enterprise Starting Neo4j. 2017-05-30 09:20:17.877+0000 INFO No SSL certificate
> found, generating a self-signed certificate.. 2017-05-30
> 09:20:18.387+0000 INFO Starting... 2017-05-30 09:20:18.676+0000 INFO
> Write transactions to database disabled 2017-05-30 09:20:19.036+0000
> INFO Bolt enabled on 0.0.0.0:7687. 2017-05-30 09:20:19.046+0000 INFO
> Initiating metrics... 2017-05-30 09:20:19.779+0000 INFO Attempting to
> join cluster of [172.0.30.110:5001, 172.0.31.39:5001,
> 172.0.32.249:5001] 2017-05-30 09:20:49.851+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component
> 'org.neo4j.server.database.LifecycleManagingDatabase@1b826a22' was
> successfully initialized, but failed to start. Please see attached
> cause exception. Starting Neo4j failed: Component
> 'org.neo4j.server.database.LifecycleManagingDatabase@1b826a22' was
> successfully initialized, but failed to start. Please see attached
> cause exception. org.neo4j.server.ServerStartupException: Starting
> Neo4j failed: Component
> 'org.neo4j.server.database.LifecycleManagingDatabase@1b826a22' was
> successfully initialized, but failed to start. Please see attached
> cause exception.
> at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
> at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:215)
> at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:90)
> at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:67)
> at org.neo4j.server.enterprise.EnterpriseEntryPoint.main(EnterpriseEntryPoint.java:32)
> Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
> 'org.neo4j.server.database.LifecycleManagingDatabase@1b826a22' was
> successfully initialized, but failed to start. Please see attached
> cause exception.
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
> at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
> ... 3 more Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.ha.factory.HighlyAvailableFacadeFactory,
> /var/lib/neo4j/data/databases/fraud.db
> at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:144)
> at org.neo4j.kernel.ha.factory.HighlyAvailableFacadeFactory.newFacade(HighlyAvailableFacadeFactory.java:42)
> at org.neo4j.kernel.ha.HighlyAvailableGraphDatabase.<init>(HighlyAvailableGraphDatabase.java:41)
> at org.neo4j.server.enterprise.EnterpriseNeoServer.lambda$static$0(EnterpriseNeoServer.java:80)
> at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:89)
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
> ... 5 more Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
> 'org.neo4j.cluster.client.ClusterJoin@6c1c89e2' was successfully
> initialized, but failed to start. Please see attached cause exception.
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
> at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
> at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:140)
> ... 10 more Caused by: java.util.concurrent.TimeoutException: Conversation-response mapping:
> {3/13#=ResponseFuture{conversationId='3/13#',
> initiatedByMessageType=join, response=null}}
> at org.neo4j.cluster.statemachine.StateMachineProxyFactory$ResponseFuture.get(StateMachineProxyFactory.java:314)
> at org.neo4j.cluster.client.ClusterJoin.joinByConfig(ClusterJoin.java:143)
> at org.neo4j.cluster.client.ClusterJoin.start(ClusterJoin.java:82)
> at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
> ... 14 more
成功(仅在一台机器上看到):
2017-05-30 09:20:22.963+0000 INFO Starting...
2017-05-30 09:20:23.319+0000 INFO Write transactions to database disabled
2017-05-30 09:20:23.619+0000 INFO Bolt enabled on 0.0.0.0:7687.
2017-05-30 09:20:23.628+0000 INFO Initiating metrics...
2017-05-30 09:20:24.352+0000 INFO Attempting to join cluster of [172.0.30.110:5001, 172.0.31.39:5001, 172.0.32.249:5001]
2017-05-30 09:20:31.382+0000 INFO Could not join cluster of [172.0.30.110:5001,172.0.31.39:5001, 172.0.32.249:5001]
2017-05-30 09:20:31.382+0000 INFO Creating new cluster with name [neo4j.ha]...
2017-05-30 09:20:31.396+0000 INFO Instance 1 (this server) entered the cluster
2017-05-30 09:20:31.401+0000 INFO Instance 1 (this server) was elected as coordinator
2017-05-30 09:20:31.413+0000 INFO I am 1, moving to master
2017-05-30 09:20:31.465+0000 INFO Instance 1 (this server) was elected as coordinator
2017-05-30 09:20:31.492+0000 INFO I am 1, successfully moved to master
2017-05-30 09:20:31.510+0000 INFO Instance 1 (this server) is available as master at ha://172.17.0.2:6001?serverId=1 with StoreId{creationTime=1496122128110,randomId=-2708986986476371425, storeVersion=15531981201765894, upgradeTime=1496122128110, upgradeId=1}
2017-05-30 09:20:31.612+0000 INFO Instance 1 (this server) is available as backup at backup://127.0.0.1:6362 with StoreId{creationTime=1496122128110, randomId=-2708986986476371425, storeVersion=15531981201765894, upgradeTime=1496122128110, upgradeId=1}
2017-05-30 09:20:31.709+0000 INFO Database available for write transactions
2017-05-30 09:20:32.929+0000 INFO Started.
2017-05-30 09:20:33.087+0000 INFO Mounted REST API at: /db/manage
2017-05-30 09:20:33.828+0000 INFO Remote interface available at http://0.0.0.0:7474/