Ansys MPI_Init_thread:找到多个pkey /分区密钥表/ MPI_IB_PKEY

时间:2017-12-05 15:48:38

标签: cluster-computing slurm ansys

ansys存在问题。当我启动它时,它抱怨一些分区。我们正在使用slurm。它是否抱怨slurm分区,其中的工作运行?但RDMA听起来更像是硬盘分区。我有点困惑,问题的原因是什么。在slurm中访问文件系统或不同的队列(分区)?以及如何解决它。有没有人以前遇到过这个bug并且可能知道解决方案?

它运行在具有NFS / home NFS / opt(ansys安装)和BeeGFS /工作目录(用于模型等)的slurm集群上。

cfx5remote: Rank 0:35: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:35: MPI_Init_thread: pkey table:

cfx5remote: Rank 0:35: MPI_Init_thread: 0x8001

cfx5remote: Rank 0:35: MPI_Init_thread: 0x7fff

cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:25: MPI_Init_thread: pkey table:

cfx5remote: Rank 0:35: MPI_Init_thread: 0xffff

cfx5remote: Rank 0:25: MPI_Init_thread: 0x8001

cfx5remote: Rank 0:25: MPI_Init_thread: 0x7fff

cfx5remote: Rank 0:25: MPI_Init_thread: 0xffff

cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed

cfx5remote: Rank 0:21: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:25: MPI_Init_thread: Can't initialize RDMA device

2 个答案:

答案 0 :(得分:2)

对于tcsh shell:

setenv MPI_IB_PKEY“0xffff”

强制应用程序使用“广播”“VLAN”。我不确定为什么有多个分区可供选择。

对于bash shell:

导出MPI_IB_PKEY =“0xffff”

答案 1 :(得分:0)

cfx5remote:Rank 0:25:MPI_Init_thread:在分区密钥表中找到多个pkey,请通过MPI_IB_PKEY选择一个

cfx5remote:等级0:25:MPI_Init_thread:ibv_get_pkey()失败

- >这是infiniband / rmda,很可能与您的文件系统完全无关。