使用Ignite计算功能窃取新节点

时间:2018-11-28 16:45:07

标签: java ignite grid-computing

我正在尝试在Ignite群集上计算一批任务,其中节点使用作业分配策略。

一切正常,除了在已启动批处理的同时新节点加入群集时:此节点似乎无法窃取已运行的批处理的任何任务。我收到以下消息:

'SEVERE: Failed to send job stealing message to node: TcpDiscoveryNode [...]'

我认为这里已经存在一个问题:https://issues.apache.org/jira/browse/IGNITE-1267

此问题似乎已在线程中解决,但在Ignite 2.6.0中,问题仍然存在。

这是我的计算配置:

    JobStealingCollisionSpi spi = new JobStealingCollisionSpi();
    spi.setWaitJobsThreshold(1);
    spi.setMessageExpireTime(1000);
    spi.setMaximumStealingAttempts(10);
    spi.setActiveJobsThreshold(1);
    spi.setStealingEnabled(true);

    JobStealingFailoverSpi failoverSpi = new JobStealingFailoverSpi();
    cfg.setCollisionSpi(spi);
    cfg.setFailoverSpi(failoverSpi);

    Ignite ignite = Ignition.start(cfg);

我做错什么了吗?

EDIT:试图重现它,但是现在看来它可以按预期工作。这是一个非常奇怪的行为!

EDIT2:设法随机重现问题,这里是堆栈:

class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=f54e6f43-620c-418d-a840-bce51ad1f5f5, addrs=[0:0:0:0:0:0:0:1%lo, 10.36.3.4, 127.0.0.1], sockAddrs=[/10.36.3.4:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1543917557221, loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2718)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2651)
    at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)
    at org.apache.ignite.internal.managers.communication.GridIoManager.sendToCustomTopic(GridIoManager.java:1703)
    at org.apache.ignite.internal.managers.GridManagerAdapter$1.send(GridManagerAdapter.java:422)
    at org.apache.ignite.spi.collision.jobstealing.JobStealingCollisionSpi.checkIdle(JobStealingCollisionSpi.java:1074)
    at org.apache.ignite.spi.collision.jobstealing.JobStealingCollisionSpi.onCollision(JobStealingCollisionSpi.java:722)
    at org.apache.ignite.internal.managers.collision.GridCollisionManager.onCollision(GridCollisionManager.java:119)
    at org.apache.ignite.internal.processors.job.GridJobProcessor.handleCollisions(GridJobProcessor.java:712)
    at org.apache.ignite.internal.processors.job.GridJobProcessor.access$3000(GridJobProcessor.java:111)
    at org.apache.ignite.internal.processors.job.GridJobProcessor$JobDiscoveryListener.onEvent(GridJobProcessor.java:2008)
    at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$LocalListenerWrapper.onEvent(GridEventStorageManager.java:1384)
    at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:873)
    at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:858)
    at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341)
    at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307)
    at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2703)
    at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2920)
    at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2732)
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
    at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=f54e6f43-620c-418d-a840-bce51ad1f5f5, addrs=[/10.36.3.4:47100, /0:0:0:0:0:0:0:1%lo:47100, /127.0.0.1:47100]]
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3422)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2958)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2841)
    at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2692)
    ... 20 more
    Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=/10.36.3.4:47100, err=Failed to read remote node recovery handshake (connection closed).]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3425)
        ... 23 more
    Caused by: class org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException: Failed to read remote node recovery handshake (connection closed).
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3737)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3276)
        ... 23 more
    Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=/10.36.3.4:47100, err=Failed to read remote node recovery handshake (connection closed).]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3425)
        ... 23 more
    Caused by: class org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException: Failed to read remote node recovery handshake (connection closed).
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3737)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3276)
        ... 23 more
    Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=/10.36.3.4:47100, err=Failed to read remote node recovery handshake (connection closed).]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3425)
        ... 23 more
    Caused by: class org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException: Failed to read remote node recovery handshake (connection closed).
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3737)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3276)
        ... 23 more
    Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=/10.36.3.4:47100, err=Failed to read remote node recovery handshake (connection closed).]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3425)
        ... 23 more
    Caused by: class org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException: Failed to read remote node recovery handshake (connection closed).
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3737)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3276)
        ... 23 more
    Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address [addr=/10.36.3.4:47100, err=Failed to read remote node recovery handshake (connection closed).]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3425)
        ... 23 more
    Caused by: class org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException: Failed to read remote node recovery handshake (connection closed).
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3737)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3276)
        ... 23 more

0 个答案:

没有答案