Question

在我的项目中，我已经实现了一个自定义内存分配器，以避免在应用程序“预热”后对cudaMalloc进行不必要的调用。此外，我使用自定义内核进行基本数组填充，数组之间的算术运算等，并希望通过使用Thrust来简化我的代码并摆脱这些内核。设备上的每个数组都是通过原始指针创建和访问的（现在），我想在这些对象上使用device_vector和Thrust方法，但我发现自己在原始指针和{之间进行转换始终{1}}，有点混乱我的代码。

我的相当模糊的问题：你如何组织使用自定义内存管理，device_ptr<>数组方法以及以最易读的方式调用自定义内核？

Answer 1

与所有标准c ++容器一样，您可以通过提供自己的"allocator"来自定义thrust::device_vector分配存储的方式。默认情况下，thrust::device_vector的分配器为thrust::device_malloc_allocator，当Thrust的后端系统为CUDA时，它会使用cudaMalloc（cudaFree）分配（解除分配）存储。

有时，需要定制device_vector分配内存的方式，例如在OP的情况下，谁想要在程序初始化时执行的单个大分配中分配存储。这可以避免许多单独调用底层分配方案可能产生的开销，在本例中为cudaMalloc。

提供device_vector自定义分配器的一种简单方法是从device_malloc_allocator继承。原则上可以从头开始创建整个分配器，但是使用继承方法，只需要提供allocate和deallocate成员函数。定义自定义分配器后，可以将其作为第二个模板参数提供给device_vector。

此示例代码演示了如何提供在分配和释放时打印消息的自定义分配器：

#include <thrust/device_malloc_allocator.h>
#include <thrust/device_vector.h>
#include <iostream>

template<typename T>
  struct my_allocator : thrust::device_malloc_allocator<T>
{
  // shorthand for the name of the base class
  typedef thrust::device_malloc_allocator<T> super_t;

  // get access to some of the base class's typedefs

  // note that because we inherited from device_malloc_allocator,
  // pointer is actually thrust::device_ptr<T>
  typedef typename super_t::pointer   pointer;

  typedef typename super_t::size_type size_type;

  // customize allocate
  pointer allocate(size_type n)
  {
    std::cout << "my_allocator::allocate(): Hello, world!" << std::endl;

    // defer to the base class to allocate storage for n elements of type T
    // in practice, you'd do something more interesting here
    return super_t::allocate(n);
  }

  // customize deallocate
  void deallocate(pointer p, size_type n)
  {
    std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl;

    // defer to the base class to deallocate n elements of type T at address p
    // in practice, you'd do something more interesting here
    super_t::deallocate(p,n);
  }
};

int main()
{
  // create a device_vector which uses my_allocator
  thrust::device_vector<int, my_allocator<int> > vec;

  // create 10 ints
  vec.resize(10, 13);

  return 0;
}

这是输出：

$ nvcc my_allocator_test.cu -arch=sm_20 -run
my_allocator::allocate(): Hello, world!
my_allocator::deallocate(): Hello, world!

在此示例中，请注意我们在my_allocator::allocate()之后收到vec.resize(10,13)的消息。当my_allocator::deallocate()超出范围时会调用vec，因为它会破坏其元素。

混合自定义内存管理和CUDA中的Thrust

1 个答案: