我正在使用softRoCE上的Accelio。
Ib devices configured -
# ibv_devices
device node GUID
------ ----------------
rxe1 821f02fffef91598
rxe0 d6bed9fffebe94af
error while running the accelio client -
# xio_ow_client
=============================================
Server Address : 127.0.0.1
Server Port : 2061
Transport : rdma
Header Length : 32
Data Length : 32
Connection Index : 0
CPU Affinity : 0
Finite run : 0
=============================================
**** starting ...
session event: connection error. reason: No such device
# rping -c
rdma_resolve_route: No such device
因此检查了opensm状态 - #/ etc / init.d / opennsd status opensm停了 #/ etc / init.d / openmd start opensm start [FAILED]
# tail -f /var/log/opensm.log
Jul 09 15:04:45 655213 [AA4F3700] 0x03 -> OpenSM 3.3.7
Jul 09 15:04:45 692960 [AA4F3700] 0x80 -> OpenSM 3.3.7
Jul 09 15:04:45 693149 [AA4F3700] 0x02 -> osm_vendor_init: 1000 pending umads specified
Jul 09 15:04:45 797977 [AA4F3700] 0x80 -> Entering DISCOVERING state
Jul 09 15:04:45 799152 [AA4F3700] 0x02 -> osm_vendor_bind: Binding to port 0xd6bed9fffebe94af
Jul 09 15:04:45 800414 [AA4F3700] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Jul 09 15:04:45 800422 [AA4F3700] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Jul 09 15:04:45 800425 [AA4F3700] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Jul 09 15:04:45 800430 [AA4F3700] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Jul 09 15:04:45 829702 [AA4F3700] 0x80 -> Exiting SM
我会理解一些指示,以便我能理解我的错误。
答案 0 :(得分:0)
RoCE设备不需要OpenSM。因此,当您只有RoCE设备时,无法启动OpenSM。
由于您未指定要连接的地址的服务器,因此rping无法运行。假设您的机器的支持RoCE的接口的IP地址为192.168.1.2(服务器)和192.168.1.3(客户端),则应按以下步骤运行命令:server$ rping -s -a 192.168.1.2
client$ rping -c -a 192.168.1.2
谢谢,
- Shachar