某些SSH引擎无法在并行IPython 2.3.0中启动

时间:2015-05-21 06:16:59

标签: ssh ipython-parallel

我正在尝试建立一个小型的IPython集群(这一切都与IPython 0.10.0 [sic!]一起整齐地工作)通过私有网络中的ssh(不需要安全性):4个节点alice,bob, carol,dan,每个都有4个CPU核心。控制器在carol上运行,所有PC都安装了Ubuntu 14.10和IPython 2.3.0。所有PC都通过NFS共享〜/ .ipython / profile_default。由于某些内部原因,我无法使用MPI。

现在,如果群集启动,我只能看到4个引擎。我已经增加了SSHEngineSetLauncher.delay,但这没有帮助

我试图追捕它并最终只使用carol(主机)并尝试通过SSH本地启动四个引擎,但只有一个实际上正在运行。

我的ipclusterconfig.py看起来像

c = get_config()
c.IPClusterStart.engine_launcher_class = 'SSHEngineSetLauncher'
c.SSHEngineSetLauncher.delay = 10
c.SSHEngineSetLauncher.engines = { 'carol' : 4}#, 'dan' : 4, 'alice' : 4, 'bob' : 4 }

engine.json:

{
    "next_id": 4,
    "engines": {
        "0": "80d135a7-b8f6-435c-930a-0cde15a6feb2",
        "1": "b69916c3-87c2-4e09-9284-aefe665ba616",
        "2": "f3df3951-5e0b-4694-aa67-7ae66a181551",
        "3": "4311705d-03d4-4e48-a7a9-7be47467c439"}}

供参考我添加日志文件: => ipcontroller.log

2015-05-21 07:28:24.442 [IPControllerApp] Hub listening on tcp://127.0.0.1:57360 for registration.
2015-05-21 07:28:24.443 [IPControllerApp] Hub using DB backend: 'NoDB'
2015-05-21 07:28:24.695 [IPControllerApp] hub::created hub
2015-05-21 07:28:24.695 [IPControllerApp] writing connection info to /home/lst3si/.ipython/profile_default/security/ipcontroller-client.json
2015-05-21 07:28:24.695 [IPControllerApp] writing connection info to /home/lst3si/.ipython/profile_default/security/ipcontroller-engine.json
2015-05-21 07:28:24.696 [IPControllerApp] task::using Python leastload Task scheduler
2015-05-21 07:28:24.696 [IPControllerApp] Heartmonitor started
2015-05-21 07:28:24.700 [IPControllerApp] Creating pid file: /home/lst3si/.ipython/profile_default/pid/ipcontroller.pid
2015-05-21 07:28:24.707 [IPControllerApp] client::client '\x00\x91y`\x0c' requested u'connection_request'
2015-05-21 07:28:24.707 [IPControllerApp] client::client ['\x00\x91y`\x0c'] connected
2015-05-21 07:28:26.071 [IPControllerApp] client::client '80d135a7-b8f6-435c-930a-0cde15a6feb2' requested u'registration_request'
2015-05-21 07:28:26.103 [IPControllerApp] WARNING | iopub::IOPub message lacks parent: {'parent_header': {}, 'msg_type': u'status', 'msg_id': u'230d5aa1-c395-4b82-a964-a3062e5550a9', 'content': {u'execution_state': u'starting'}, 'header': {u'date': datetime.datetime(2015, 5, 21, 7, 28, 26, 102954), u'username': u'lst3si', u'session': u'80d135a7-b8f6-435c-930a-0cde15a6feb2', u'msg_id': u'230d5aa1-c395-4b82-a964-a3062e5550a9', u'msg_type': u'status'}, 'buffers': [], 'metadata': {}}
2015-05-21 07:28:30.699 [IPControllerApp] registration::finished registering engine 0:80d135a7-b8f6-435c-930a-0cde15a6feb2
2015-05-21 07:28:30.699 [IPControllerApp] engine::Engine Connected: 0
2015-05-21 07:28:36.071 [IPControllerApp] client::client 'b69916c3-87c2-4e09-9284-aefe665ba616' requested u'registration_request'
2015-05-21 07:28:36.102 [IPControllerApp] WARNING | iopub::IOPub message lacks parent: {'parent_header': {}, 'msg_type': u'status', 'msg_id': u'f74a1f38-f3fb-422f-b4ad-0d1724745c64', 'content': {u'execution_state': u'starting'}, 'header': {u'date': datetime.datetime(2015, 5, 21, 7, 28, 36, 102052), u'username': u'lst3si', u'session': u'b69916c3-87c2-4e09-9284-aefe665ba616', u'msg_id': u'f74a1f38-f3fb-422f-b4ad-0d1724745c64', u'msg_type': u'status'}, 'buffers': [], 'metadata': {}}
2015-05-21 07:28:36.285 [IPControllerApp] client::client '\x00\x91y`\r' requested u'connection_request'
2015-05-21 07:28:36.285 [IPControllerApp] client::client ['\x00\x91y`\r'] connected
2015-05-21 07:28:39.699 [IPControllerApp] registration::finished registering engine 1:b69916c3-87c2-4e09-9284-aefe665ba616
2015-05-21 07:28:39.699 [IPControllerApp] engine::Engine Connected: 1
2015-05-21 07:28:46.143 [IPControllerApp] client::client 'f3df3951-5e0b-4694-aa67-7ae66a181551' requested u'registration_request'
2015-05-21 07:28:46.175 [IPControllerApp] WARNING | iopub::IOPub message lacks parent: {'parent_header': {}, 'msg_type': u'status', 'msg_id': u'a3aa09af-6958-4362-a1f4-5df01da8941b', 'content': {u'execution_state': u'starting'}, 'header': {u'date': datetime.datetime(2015, 5, 21, 7, 28, 46, 174675), u'username': u'lst3si', u'session': u'f3df3951-5e0b-4694-aa67-7ae66a181551', u'msg_id': u'a3aa09af-6958-4362-a1f4-5df01da8941b', u'msg_type': u'status'}, 'buffers': [], 'metadata': {}}
2015-05-21 07:28:51.699 [IPControllerApp] registration::finished registering engine 2:f3df3951-5e0b-4694-aa67-7ae66a181551
2015-05-21 07:28:51.700 [IPControllerApp] engine::Engine Connected: 2
2015-05-21 07:28:56.113 [IPControllerApp] client::client '4311705d-03d4-4e48-a7a9-7be47467c439' requested u'registration_request'
2015-05-21 07:28:56.145 [IPControllerApp] WARNING | iopub::IOPub message lacks parent: {'parent_header': {}, 'msg_type': u'status', 'msg_id': u'671288cf-32ea-4a41-8e17-9be4ba1216dd', 'content': {u'execution_state': u'starting'}, 'header': {u'date': datetime.datetime(2015, 5, 21, 7, 28, 56, 144586), u'username': u'lst3si', u'session': u'4311705d-03d4-4e48-a7a9-7be47467c439', u'msg_id': u'671288cf-32ea-4a41-8e17-9be4ba1216dd', u'msg_type': u'status'}, 'buffers': [], 'metadata': {}}
2015-05-21 07:29:00.698 [IPControllerApp] registration::finished registering engine 3:4311705d-03d4-4e48-a7a9-7be47467c439
2015-05-21 07:29:00.700 [IPControllerApp] engine::Engine Connected: 3

=> ipengine.log(看起来都一样,只有“用id x完成注册”,其中x从引脚的0增加到3):

2015-05-21 07:28:26.065 [IPEngineApp] Loading url_file u'.ipython/profile_default/security/ipcontroller-engine.json'
2015-05-21 07:28:26.070 [IPEngineApp] Registering with controller at tcp://127.0.0.1:57360
2015-05-21 07:28:26.101 [IPEngineApp] Starting to monitor the heartbeat signal from the hub every 3010 ms.
2015-05-21 07:28:26.102 [IPEngineApp] Using existing profile dir: u'.ipython/profile_default'
2015-05-21 07:28:26.103 [IPEngineApp] Completed registration with id 0

1 个答案:

答案 0 :(得分:0)

我自己解决了这个问题。由于忽略了ifconfig的本地化输出,IPython.utils.localinterfaces.public_ips(我报告)中的错误导致引擎无法启动,返回“Adresse:127.0.0.1”(我更改了IP值)。 / p>

作为一种解决方法,我现在使用以下ipclusterconfig.py(注意--location中的controller_args选项):

c = get_config()
c.IPClusterEngines.engine_launcher_class = 'SSH'
c.LocalControllerLauncher.controller_args = ['--location=<engine_ip1>', '--ip=*']
c.SSHEngineSetLauncher.engine = { <engine_ip1> : 4, <engine_ip> : 4 }

在此示例中,Controller在<engine_ip1>

上本地运行