ansys存在问题。当我启动它时,它抱怨一些分区。我们正在使用slurm。它是否抱怨slurm分区,其中的工作运行?但RDMA听起来更像是硬盘分区。我有点困惑,问题的原因是什么。在slurm中访问文件系统或不同的队列(分区)?以及如何解决它。有没有人以前遇到过这个bug并且可能知道解决方案?
它运行在具有NFS / home NFS / opt(ansys安装)和BeeGFS /工作目录(用于模型等)的slurm集群上。
cfx5remote: Rank 0:35: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:35: MPI_Init_thread: pkey table:
cfx5remote: Rank 0:35: MPI_Init_thread: 0x8001
cfx5remote: Rank 0:35: MPI_Init_thread: 0x7fff
cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:25: MPI_Init_thread: pkey table:
cfx5remote: Rank 0:35: MPI_Init_thread: 0xffff
cfx5remote: Rank 0:25: MPI_Init_thread: 0x8001
cfx5remote: Rank 0:25: MPI_Init_thread: 0x7fff
cfx5remote: Rank 0:25: MPI_Init_thread: 0xffff
cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed
cfx5remote: Rank 0:21: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:25: MPI_Init_thread: Can't initialize RDMA device
答案 0 :(得分:2)
对于tcsh shell:
setenv MPI_IB_PKEY“0xffff”
强制应用程序使用“广播”“VLAN”。我不确定为什么有多个分区可供选择。
对于bash shell:
导出MPI_IB_PKEY =“0xffff”
答案 1 :(得分:0)
cfx5remote:Rank 0:25:MPI_Init_thread:在分区密钥表中找到多个pkey,请通过MPI_IB_PKEY选择一个
cfx5remote:等级0:25:MPI_Init_thread:ibv_get_pkey()失败
- >这是infiniband / rmda,很可能与您的文件系统完全无关。