Question

我正在尝试解决在运行包含在CUDA示例中的simpleP2P示例程序时发现的错误。错误如下：

DataBufferInt buffer = new DataBufferInt(pixels, pixels.length);

int[] bandMasks = {0xFF0000, 0xFF00, 0xFF}; // RGB (no alpha)
WritableRaster raster = Raster.createPackedRaster(buffer, width, height, width, bandMasks, null);

ColorModel cm = new DirectColorModel(32,
        0x00ff0000,       // Red
        0x0000ff00,       // Green
        0x000000ff,       // Blue
        0x00000000        // No Alpha
);
BufferedImage bufferImg = new BufferedImage(cm, raster, cm.isAlphaPremultiplied(), null);

我使用的设备如下：

$ ./simpleP2P 
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = "     Tesla K20c" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "     Tesla K20c" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer-to-Peer (P2P) access from Tesla K20c (GPU0) -> Tesla K20c (GPU1) : No
> Peer-to-Peer (P2P) access from Tesla K20c (GPU1) -> Tesla K20c (GPU0) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available between GPU0 <-> GPU1, waiving test.

有关从nvidia-smi获得的连接的其他信息：

$ lspci | grep NVIDIA
03:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)

最后来自lspci工具的更详细的输出。

$ nvidia-smi topo -m
    GPU0    GPU1    CPU Affinity
GPU0     X  SOC 0-5,12-17
GPU1    SOC  X  6-11,18-23

Legend:

  X   = Self
  SOC = Path traverses a socket-level link (e.g. QPI)
  PHB = Path traverses a PCIe host bridge
  PXB = Path traverses multiple PCIe internal switches
  PIX = Path traverses a PCIe internal switch

你们中的任何人都有一些信息可以帮助我排除故障或至少更好地了解问题出在哪里？非常感谢阅读/帮助。 - 奥马尔

Answer 1

当GPU通过套接字级链接（基于Intel的系统的QPI）互连时：

GPU0     X  SOC 0-5,12-17
GPU1    SOC  X  6-11,18-23
        ^^^

然后在这两个GPU之间无法进行P2P交易。

参与P2P的GPU有许多要求。其中之一是它们通常必须位于同一个PCIE根复合体上。通过套接字级链接（例如QPI）连接的GPU位于两个不同的插槽上。即2个不同的CPU，因此它们属于两个不同的PCIE根复合体。

请注意，通常，P2P支持可能因GPU或GPU系列而异。在一个GPU类型或GPU系列上运行P2P的能力并不一定表明它可以在另一种GPU类型或系列上运行，即使在同一系统/设置中也是如此。 GPU P2P支持的最终决定因素是通过cudaDeviceCanAccessPeer查询运行时提供的工具。 P2P支持也可能因系统和其他因素而异。此处所做的任何陈述都不是任何特定设置中任何特定GPU支持P2P的保证。

运行多GPU CUDA示例（simpleP2P）时P2P内存访问失败

1 个答案: