如何在PyCUDA中从现有的numpy数组创建页面锁定内存?

时间:2011-10-04 17:01:40

标签: pycuda

PyCUDA help explains how to create an empty or zeroed array但不是如何将现有的numpy数组移动(?)到页面锁定的内存中。我是否需要获取numpy数组的指针并将其传递给pycuda.driver.PagelockedHostAllocation?我该怎么做?

更新

< - sniped - >

更新2

感谢 talonmies 为您提供帮助。现在内存转移是页面锁定的,但程序以以下错误结束:

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: invalid context

这是更新的代码:

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import numpy as np
import ctypes
from pycuda import driver, compiler, gpuarray
from pycuda.tools import PageLockedMemoryPool
import pycuda.autoinit

memorypool = PageLockedMemoryPool()

indata = np.random.randn(5).astype(np.float32)
outdata = gpuarray.zeros(5, dtype=np.float32)

pinnedinput = memorypool.allocate(indata.shape,np.float32)

source = indata.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
dest = pinnedinput.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
sz = indata.size * ctypes.sizeof(ctypes.c_float)
ctypes.memmove(dest,source,sz)


kernel_code = """
 __global__ void kernel(float *indata, float *outdata) {
 int globalid = blockIdx.x * blockDim.x + threadIdx.x ;
 outdata[globalid] = indata[globalid]+1.0f;

 }
 """

mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")

kernel(
 driver.In(pinnedinput), outdata,
 grid = (5,1),
 block = (1, 1, 1),
)
print indata
print outdata.get()
memorypool.free_held()

3 个答案:

答案 0 :(得分:3)

您需要将源数组中的数据复制到持有pycuda返回的页锁定分配的数组中。最直接的方法是通过ctypes

import numpy
import ctypes

x=numpy.array([1,2,3,4],dtype=numpy.double)
y=numpy.zeros_like(x)

source = x.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
dest = y.ctypes.data_as(ctypes.POINTER(ctypes.c_double))
sz = x.size * ctypes.sizeof(ctypes.c_double)

ctypes.memmove(dest,source,sz)

print y

numpy.ctypes接口可用于获取指向用于保存数组数据的内存的指针,然后ctypes.memmove用于在两个不同的ndarray之间进行复制。使用裸C指针的所有常见注意事项都适用,因此需要一些注意,但它很容易使用。

答案 1 :(得分:1)

The memory block is still active. You might explicitly free the pinned array:

print memorypool.active_blocks
pinnedinput.base.free()
print memorypool.active_blocks
memorypool.free_held()

答案 2 :(得分:0)

我一直在以一种更简单的方式进行此操作:

locked_ary = cuda.pagelocked_empty_like(ary, mem_flags=cuda.host_alloc_flags.DEVICEMAP)
locked_ary[:] = ary

结果具有正确的AlignedHostAllocation基,并且计时与我使用ctypes.memmove得到的计时相同。