Pycuda DeviceMemoryPool和gpuarray.to_gpu_async无法按预期运行

时间:2018-12-23 19:28:07

标签: concurrency stream pycuda

我似乎无法通过以下代码实现并发:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import pandas as pd
import pycuda.autoinit
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
from pycuda.tools import DeviceMemoryPool as DMP
from pycuda.compiler import SourceModule

data_DevMemPool = DMP()
some_long_running_kernel = SourceModule(some_long_running_kernel_SRC, no_extern_c=True)

stream = []

for k in range (10):
    stream.append(drv.Stream())

numpy_data = np.zeros((2048,4000)).astype(np.float32)    


#Why won't this parallelize?:
for i in range(10):
    gpu_data = gpuarray.to_gpu_async(numpy_data,allocator = data_DevMemPool.allocate,stream=stream[i] )

    some_long_running_kernel(
                gpu_data,
                block=(1024,1,1),grid=(2,1,1),stream=stream[i] )

随后运行:

    data_DevMemPool.held_blocks
    data_DevMemPool.active_blocks

分别显示值1和1,这表明设备内存池在任何时候都没有扩展到超过1,因为如果实现了并发性,将会发生这种情况。尽管这两个GPU指令(同时提供了 gpuarray.to_gpu_async() some_long_running_kernel())都已提供了流)。

0 个答案:

没有答案