Ignite启动时间太长,经常在应用程序运行时失败

时间:2017-10-26 18:12:51

标签: java spring tcp distributed ignite

我使用Spring Boot应用程序创建了两个节点的集群。没有任何外部资源 - 只有内存中的DB和Apache Ignite作为分布式缓存。当一个节点已经启动而第二个节点已经提交启动时,我做了两个节点的tcp转储和JFR。

我在尝试启动的节点上看到了这个: JFR for second node which is in starting state

只获得字节十秒钟。与此同时,该节点上的另一个Ignite SPI线程“tcp-disco-msg-worker”尝试从同一个远程IP和端口读取,也花了大约10秒钟。

JFR SPI long read

同时第一个已经启动的节点试图做同样的事情 - 读取一个字节大约十秒钟 First node

所以看起来他们试图同时从对方读取消息并互相阻塞 - 当这个SPI线程试图读取它时无法写回答。

但我没有在另一个群集上看到这样的行为,其中两个服务器具有完全相同的应用及其配置 - 没有任何长读取。

有没有人知道我的第一个集群有什么问题?为什么Ignite会尝试在两个节点上同时读取有关拓扑的元信息?

主要问题是我在第一个集群的一个节点上经常出现Ignite失败,因为SPI线程决定集群处于SEGMENTED状态。并且SPI线程进入此状态是因为它无法在常规通信期间从其中一个节点获取元信息。

似乎这个群集存在一些网络问题,但所有低级别测试都表明一切正常。

我知道他们说过SEGMENTED状态很长的GC,网络问题等等。但我没有看到这样的事情。在SPI线程中的套接字通信期间只读取很长时间。它只在一个集群上。具有相同配置的另一个群集没有这样的问题。

有人可以给我一个建议我应该检查或测试什么?

UPD: 我的配置:

    @Bean
public IgniteCache<String, WaveFile> getCache() {
    log.info("Ignite addresses: " + Arrays.asList(igniteAddresses));
    IgniteConfiguration conf = new IgniteConfiguration();
    Slf4jLogger gridLog = new Slf4jLogger(log);
    conf.setGridLogger(gridLog);
    conf.setClientMode(false);
    conf.setNetworkTimeout(networkTimeout);
    conf.setClassLoader(WaveFile.class.getClassLoader());

    TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
    TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
    ipFinder.setAddresses(Arrays.asList(igniteAddresses));
    tcpDiscoverySpi.setIpFinder(ipFinder);
    tcpDiscoverySpi.setAckTimeout(failureDetectionTimeout);
    conf.setDiscoverySpi(tcpDiscoverySpi);
    conf.setSqlConnectorConfiguration(null);

    MemoryConfiguration memoryConfiguration = new MemoryConfiguration();
    MemoryPolicyConfiguration memoryPolicyConfiguration = new MemoryPolicyConfiguration();
    memoryPolicyConfiguration.setInitialSize(5L * 1024 * 1024 * 1024);
    memoryConfiguration.setMemoryPolicies(memoryPolicyConfiguration);
    conf.setMemoryConfiguration(memoryConfiguration);

    Ignite ignite = Ignition.start(conf);
    // Cache Config
    CacheConfiguration<String, WaveFile> cacheConfig = new CacheConfiguration<>();
    cacheConfig.setName(cacheName);
    cacheConfig.setCacheMode(CacheMode.PARTITIONED);
    cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
    cacheConfig.setCopyOnRead(false);
    cacheConfig.setEagerTtl(eagerTtl);
    cacheConfig.setExpiryPolicyFactory(FactoryBuilder.factoryOf(
            new TouchedExpiryPolicy(new Duration(TimeUnit.SECONDS, expirationPeriod))));
    return ignite.getOrCreateCache(cacheConfig);
}

日志。你可以看到有4个节点,但实际上我只在其中两个节点上进行了测试。

2017-10-27 14:01:53.084 [main] ERROR  - Failed to resolve default logging config file: config/java.util.logging.properties
2017-10-27 14:02:35.175 [main] INFO  com.voice.Application - Ignite addresses: [10.200.43.28, 10.200.43.32, 10.200.43.33, 10.200.43.34]
2017-10-27 14:02:35.369 [main] INFO  o.a.ignite.internal.IgniteKernal -

>>>    __________  ________________
>>>   /  _/ ___/ |/ /  _/_  __/ __/
>>>  _/ // (7 7    // /  / / / _/
>>> /___/\___/_/|_/___/ /_/ /___/
>>>
>>> ver. 2.1.0#20170720-sha1:a6ca5c8a
>>> 2017 Copyright(C) Apache Software Foundation
>>>
>>> Ignite documentation: http://ignite.apache.org

[14:02:35]    __________  ________________
[14:02:35]   /  _/ ___/ |/ /  _/_  __/ __/
[14:02:35]  _/ // (7 7    // /  / / / _/
[14:02:35] /___/\___/_/|_/___/ /_/ /___/
[14:02:35]
[14:02:35] ver. 2.1.0#20170720-sha1:a6ca5c8a
[14:02:35] 2017 Copyright(C) Apache Software Foundation
[14:02:35]
[14:02:35] Ignite documentation: http://ignite.apache.org
[14:02:35]
[14:02:35] Quiet mode.
[14:02:35]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
[14:02:35]
2017-10-27 14:02:35.370 [main] INFO  o.a.ignite.internal.IgniteKernal - Config URL: n/a
2017-10-27 14:02:35.370 [main] INFO  o.a.ignite.internal.IgniteKernal - Daemon mode: off
[14:02:35] OS: Linux 3.10.0-327.el7.x86_64 amd64
2017-10-27 14:02:35.370 [main] INFO  o.a.ignite.internal.IgniteKernal - OS: Linux 3.10.0-327.el7.x86_64 amd64
2017-10-27 14:02:35.371 [main] INFO  o.a.ignite.internal.IgniteKernal - OS user: mopopov
2017-10-27 14:02:35.371 [main] INFO  o.a.ignite.internal.IgniteKernal - PID: 31516
[14:02:35] VM information: Java(TM) SE Runtime Environment 1.8.0_121-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13
2017-10-27 14:02:35.372 [main] INFO  o.a.ignite.internal.IgniteKernal - Language runtime: Java Platform API Specification ver. 1.8
2017-10-27 14:02:35.372 [main] INFO  o.a.ignite.internal.IgniteKernal - VM information: Java(TM) SE Runtime Environment 1.8.0_121-b13 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13
2017-10-27 14:02:35.373 [main] INFO  o.a.ignite.internal.IgniteKernal - VM total memory: 10.0GB
2017-10-27 14:02:35.374 [main] INFO  o.a.ignite.internal.IgniteKernal - Remote Management [restart: off, REST: on, JMX (remote: off)]
2017-10-27 14:02:35.374 [main] INFO  o.a.ignite.internal.IgniteKernal - IGNITE_HOME=null
2017-10-27 14:02:35.374 [main] INFO  o.a.ignite.internal.IgniteKernal - VM arguments: [-Xmx12000m, -XX:+UnlockCommercialFeatures, -XX:+HeapDumpOnOutOfMemoryError, -Dspring.profiles.active=prod]
2017-10-27 14:02:35.374 [main] INFO  o.a.ignite.internal.IgniteKernal - System cache's MemoryPolicy size is configured to 40 MB. Use MemoryConfiguration.systemCacheMemorySize property to change the setting.
2017-10-27 14:02:35.388 [main] INFO  o.a.ignite.internal.IgniteKernal - Configured caches [in 'sysMemPlc' memoryPolicy: ['ignite-sys-cache']]
2017-10-27 14:02:35.395 [pub-#14%null%] WARN  o.a.ignite.internal.GridDiagnostic - Initial heap size is 250MB (should be no less than 512MB, use -Xms512m -Xmx512m).
[14:02:35] Initial heap size is 250MB (should be no less than 512MB, use -Xms512m -Xmx512m).
[14:02:35] Configured plugins:
2017-10-27 14:02:35.477 [main] INFO  o.a.i.i.p.p.IgnitePluginProcessor - Configured plugins:
[14:02:35]   ^-- None
2017-10-27 14:02:35.478 [main] INFO  o.a.i.i.p.p.IgnitePluginProcessor -   ^-- None
[14:02:35]
2017-10-27 14:02:35.478 [main] INFO  o.a.i.i.p.p.IgnitePluginProcessor -
2017-10-27 14:02:35.555 [main] INFO  o.a.i.s.c.tcp.TcpCommunicationSpi - Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
2017-10-27 14:02:35.560 [main] WARN  o.a.i.s.c.tcp.TcpCommunicationSpi - Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[14:02:35] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
2017-10-27 14:02:35.586 [main] WARN  o.a.i.s.c.noop.NoopCheckpointSpi - Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
2017-10-27 14:02:35.627 [main] WARN  o.a.i.i.m.c.GridCollisionManager - Collision resolution is disabled (all jobs will be activated upon arrival).
[14:02:35] Security status [authentication=off, tls/ssl=off]
2017-10-27 14:02:35.628 [main] INFO  o.a.ignite.internal.IgniteKernal - Security status [authentication=off, tls/ssl=off]
2017-10-27 14:02:36.072 [main] INFO  o.a.i.i.p.r.p.t.GridTcpRestProtocol - Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
2017-10-27 14:02:36.133 [main] INFO  o.a.ignite.internal.IgniteKernal - Non-loopback local IPs: 10.200.43.34, fe80:0:0:0:250:56ff:fe93:3f60%eno16777984
2017-10-27 14:02:36.133 [main] INFO  o.a.ignite.internal.IgniteKernal - Enabled local MACs: 005056933F60
2017-10-27 14:02:36.188 [main] INFO  o.a.i.s.d.tcp.TcpDiscoverySpi - Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=64941c34-2912-4934-8c92-c0c089391207]
2017-10-27 14:02:51.267 [main] WARN  o.a.i.s.d.tcp.TcpDiscoverySpi - Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
2017-10-27 14:03:46.397 [exchange-worker-#29%null%] INFO  o.a.ignite.internal.exchange.time - Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=false, evt=10, node=TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 4212-bio-lanta-01/10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102156183, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 4212-bio-lanta-01/10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102156183, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=null]
2017-10-27 14:03:47.022 [exchange-worker-#29%null%] INFO  o.a.i.i.p.cache.GridCacheProcessor - Started cache [name=ignite-sys-cache, memoryPolicyName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL]
2017-10-27 14:03:47.062 [exchange-worker-#29%null%] INFO  o.a.i.i.p.cache.GridCacheProcessor - Started cache [name=SHARED_AUDIO_CACHE, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC]
2017-10-27 14:03:57.115 [grid-nio-worker-tcp-comm-0-#17%null%] INFO  o.a.i.s.c.tcp.TcpCommunicationSpi - Established outgoing communication connection [locAddr=/10.200.43.34:53456, rmtAddr=/10.200.43.28:47100]
2017-10-27 14:03:57.172 [exchange-worker-#29%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], waitTime=0ms]
2017-10-27 14:03:57.252 [exchange-worker-#29%null%] INFO  o.a.ignite.internal.exchange.time - Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=false]
2017-10-27 14:03:57.293 [sys-#32%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], time=0ms]
2017-10-27 14:03:57.344 [exchange-worker-#29%null%] INFO  o.a.i.i.p.c.GridCachePartitionExchangeManager - Rebalancing scheduled [order=[ignite-sys-cache, SHARED_AUDIO_CACHE]]
2017-10-27 14:03:57.344 [exchange-worker-#29%null%] INFO  o.a.i.i.p.c.GridCachePartitionExchangeManager - Rebalancing started [top=AffinityTopologyVersion [topVer=13, minorTopVer=0], evt=NODE_JOINED, node=64941c34-2912-4934-8c92-c0c089391207]
2017-10-27 14:03:57.345 [exchange-worker-#29%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionDemander - Starting rebalancing [mode=SYNC, fromNode=110e4e93-2954-4b76-8649-a23f4a83990a, partitionsCount=100, topology=AffinityTopologyVersion [topVer=13, minorTopVer=0], updateSeq=1]
2017-10-27 14:03:57.391 [utility-#34%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionDemander - Completed (final) rebalancing [fromNode=110e4e93-2954-4b76-8649-a23f4a83990a, topology=AffinityTopologyVersion [topVer=13, minorTopVer=0], time=51 ms]
2017-10-27 14:03:57.395 [utility-#34%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionDemander - Starting rebalancing [mode=ASYNC, fromNode=110e4e93-2954-4b76-8649-a23f4a83990a, partitionsCount=518, topology=AffinityTopologyVersion [topVer=13, minorTopVer=0], updateSeq=1]
2017-10-27 14:03:57.449 [sys-#36%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionDemander - Completed (final) rebalancing [fromNode=110e4e93-2954-4b76-8649-a23f4a83990a, topology=AffinityTopologyVersion [topVer=13, minorTopVer=0], time=51 ms]
[14:03:57] Performance suggestions for grid  (fix if possible)
2017-10-27 14:03:57.463 [main] INFO  o.a.ignite.internal.IgniteKernal - Performance suggestions for grid  (fix if possible)
[14:03:57] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
2017-10-27 14:03:57.463 [main] INFO  o.a.ignite.internal.IgniteKernal - To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[14:03:57]   ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM options)
2017-10-27 14:03:57.463 [main] INFO  o.a.ignite.internal.IgniteKernal -   ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM options)
[14:03:57]   ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
2017-10-27 14:03:57.464 [main] INFO  o.a.ignite.internal.IgniteKernal -   ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[14:03:57]   ^-- Disable processing of calls to System.gc() (add '-XX:+DisableExplicitGC' to JVM options)
2017-10-27 14:03:57.464 [main] INFO  o.a.ignite.internal.IgniteKernal -   ^-- Disable processing of calls to System.gc() (add '-XX:+DisableExplicitGC' to JVM options)
[14:03:57]   ^-- Speed up flushing of dirty pages by OS (alter vm.dirty_expire_centisecs parameter by setting to 500)
2017-10-27 14:03:57.464 [main] INFO  o.a.ignite.internal.IgniteKernal -   ^-- Speed up flushing of dirty pages by OS (alter vm.dirty_expire_centisecs parameter by setting to 500)
[14:03:57]   ^-- Reduce pages swapping ratio (set vm.swappiness=10)
2017-10-27 14:03:57.464 [main] INFO  o.a.ignite.internal.IgniteKernal -   ^-- Reduce pages swapping ratio (set vm.swappiness=10)
[14:03:57]   ^-- Disable fully synchronous writes (set 'writeSynchronizationMode' to PRIMARY_SYNC or FULL_ASYNC)
2017-10-27 14:03:57.464 [main] INFO  o.a.ignite.internal.IgniteKernal -   ^-- Disable fully synchronous writes (set 'writeSynchronizationMode' to PRIMARY_SYNC or FULL_ASYNC)
[14:03:57] Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning
2017-10-27 14:03:57.464 [main] INFO  o.a.ignite.internal.IgniteKernal - Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning
[14:03:57]
2017-10-27 14:03:57.465 [main] INFO  o.a.ignite.internal.IgniteKernal -
[14:03:57] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
2017-10-27 14:03:57.465 [main] INFO  o.a.ignite.internal.IgniteKernal - To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[14:03:57]
[14:03:57] Ignite node started OK (id=64941c34)
2017-10-27 14:03:57.465 [main] INFO  o.a.ignite.internal.IgniteKernal -
2017-10-27 14:03:57.466 [main] INFO  o.a.ignite.internal.IgniteKernal -

>>> +----------------------------------------------------------------------+
>>> Ignite ver. 2.1.0#20170720-sha1:a6ca5c8a97e9a4c9d73d40ce76d1504c14ba1940
>>> +----------------------------------------------------------------------+
>>> OS name: Linux 3.10.0-327.el7.x86_64 amd64
>>> CPU(s): 6
>>> Heap: 10.0GB
>>> VM name: 31516@4212-bio-lanta-01
>>> Local node [ID=64941C34-2912-4934-8C92-C0C089391207, order=13, clientMode=false]
>>> Local node addresses: [4212-bio-lanta-01/0:0:0:0:0:0:0:1%lo, /10.200.43.34, /127.0.0.1]
>>> Local ports: TCP:11211 TCP:47100 TCP:47500

[14:03:57] Topology snapshot [ver=13, servers=2, clients=0, CPUs=12, heap=22.0GB]
2017-10-27 14:03:57.468 [main] INFO  o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=13, servers=2, clients=0, CPUs=12, heap=22.0GB]
2017-10-27 14:03:58.944 [exchange-worker-#29%null%] INFO  o.a.ignite.internal.exchange.time - Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], crd=false, evt=18, node=TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 4212-bio-lanta-01/10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102156183, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 4212-bio-lanta-01/10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102156183, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=CacheAffinityChangeMessage [id=729ac7d5f51-0fc83046-7af0-474f-8487-f526629190a8, topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]]
2017-10-27 14:03:58.952 [exchange-worker-#29%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], waitTime=0ms]
2017-10-27 14:03:58.975 [exchange-worker-#29%null%] INFO  o.a.ignite.internal.exchange.time - Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], crd=false]
2017-10-27 14:03:58.991 [sys-#41%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], time=0ms]
2017-10-27 14:03:58.994 [exchange-worker-#29%null%] INFO  o.a.i.i.p.c.GridCachePartitionExchangeManager - Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, node=110e4e93-2954-4b76-8649-a23f4a83990a]
2017-10-27 14:04:01.144 [main] INFO  com.voice.Application - Run config context -> ok
2017-10-27 14:04:01.145 [main] INFO  com.voice.Application - StartUp date: Fri Oct 27 14:01:48 MSK 2017
节点失败时

UPD2:日志:

    2017-10-28 07:48:56.320 [grid-nio-worker-tcp-comm-2-#19%null%] INFO  o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:52329]
2017-10-28 07:48:56.323 [grid-nio-worker-tcp-comm-3-#20%null%] INFO  o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100, rmtAddr=/0:0:0:0:0:0:0:1:55692]
2017-10-28 07:48:56.323 [grid-nio-worker-tcp-comm-0-#17%null%] INFO  o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:52335]
2017-10-28 07:48:59.535 [RTP-948] INFO  com.voice.rtp.RtpReceiverImpl - Start callId 887B4B84-C47F-461E-35B0-97D0D7F86A51-7092 RTP processing with local port 36496
2017-10-28 07:49:00.116 [grid-nio-worker-tcp-comm-1-#18%null%] INFO  o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100, rmtAddr=/0:0:0:0:0:0:0:1:55698]
2017-10-28 07:49:00.117 [grid-nio-worker-tcp-comm-2-#19%null%] INFO  o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:52341]
2017-10-28 07:49:00.372 [tcp-disco-msg-worker-#2%null%] WARN  o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to short-time network problems).
2017-10-28 07:49:00.373 [disco-event-worker-#28%null%] WARN  o.a.i.i.m.d.GridDiscoveryManager - Local node SEGMENTED: TcpDiscoveryNode [id=c1886afe-b28f-45b5-a91b-cfdd9e635374, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.33, 127.0.0.1], sockAddrs=[4212-bio-lanta-03/10.200.43.33:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=15, intOrder=10, lastExchangeTime=1509102474635, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false]
2017-10-28 07:49:00.399 [disco-event-worker-#28%null%] WARN  o.a.i.i.m.d.GridDiscoveryManager - Stopping local node according to configured segmentation policy.
2017-10-28 07:49:00.400 [disco-event-worker-#28%null%] WARN  o.a.i.i.m.d.GridDiscoveryManager - Node FAILED: TcpDiscoveryNode [id=110e4e93-2954-4b76-8649-a23f4a83990a, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.28, 127.0.0.1], sockAddrs=[/10.200.43.28:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=11, intOrder=7, lastExchangeTime=1509102530512, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false]
2017-10-28 07:49:00.413 [disco-event-worker-#28%null%] INFO  o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=16, servers=3, clients=0, CPUs=18, heap=31.0GB]
2017-10-28 07:49:00.450 [Thread-52685] INFO  o.a.i.i.p.r.p.t.GridTcpRestProtocol - Command protocol successfully stopped: TCP binary
2017-10-28 07:49:00.452 [disco-event-worker-#28%null%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=16, minorTopVer=0], time=0ms]
2017-10-28 07:49:00.470 [disco-event-worker-#28%null%] WARN  o.a.i.i.m.d.GridDiscoveryManager - Node FAILED: TcpDiscoveryNode [id=64941c34-2912-4934-8c92-c0c089391207, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.34, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /10.200.43.34:47500], discPort=47500, order=13, intOrder=8, lastExchangeTime=1509102540526, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false]
2017-10-28 07:49:00.479 [disco-event-worker-#28%null%] INFO  o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=17, servers=2, clients=0, CPUs=12, heap=21.0GB]
2017-10-28 07:49:00.480 [disco-event-worker-#28%null%] WARN  o.a.i.i.m.d.GridDiscoveryManager - Node FAILED: TcpDiscoveryNode [id=f6678eb1-f2f6-4da1-8d4d-a259c435b5a6, addrs=[0:0:0:0:0:0:0:1%lo, 10.200.43.32, 127.0.0.1], sockAddrs=[/10.200.43.32:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=14, intOrder=9, lastExchangeTime=1509102550535, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false]
2017-10-28 07:49:00.481 [disco-event-worker-#28%null%] INFO  o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=18, servers=1, clients=0, CPUs=6, heap=10.0GB]
2017-10-28 07:49:01.125 [RTP-948] ERROR com.voice.rtp.RtpReceiverImpl - During RTP socket communications
javax.cache.CacheException: class org.apache.ignite.internal.NodeStoppingException: Operation has been cancelled (node is stopping).
        at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1312)
        at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.cacheException(IgniteCacheProxy.java:2630)
        at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.get(IgniteCacheProxy.java:1188)

登录DEBUG模式位于:https://drive.google.com/drive/folders/0B00-0mjd5sy1MjNnSkdyR1M4VFk?usp=sharing

2 个答案:

答案 0 :(得分:2)

问题在于DNS中没有群集主机名。在纠正之后,没有十秒钟的读取时间,群集仍然可以正常工作至少半天。

答案 1 :(得分:1)

在.net核心上使用c#客户端时,我遇到了同样的问题,在应用程序启动之前有10秒的延迟。

调查问题之后,日志中的以下提示很有帮助:

Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]"

我发现这意味着我需要设置  IgniteConfiguration.Localhost到本地回送地址127.0.0.1。我不知道为什么默认值0.0.0.0

进行更改后,日志现在确认已使用该值,并且

Successfully bound communication NIO server to TCP port [port=47100, locHost=/127.0.0.1, selectorsCnt=4, selectorSpins=0, pairedConn=false]

点燃会在几秒钟内启动。希望这对其他人有所帮助。