
时间:2014-12-25 17:09:05

标签: c# cuda cudafy.net


我有一个视频帧采集卡,以30 FPS收集字节[1024 x 1024]图像数据。每33.3毫秒填充一个循环缓冲区中的一个槽并返回一个System.IntPtr,指向*byte的未管理的1D向量;循环缓冲区有15个插槽。

在GPU设备(Tesla K40)上,我想要一个全局的2D数组,它被组织成一个密集的2D数组。也就是说,我想要像循环队列这样的东西,但在GPU上组织成一个密集的2D阵列。

byte[15, 1024*1024] rawdata; 
// if CUDAfy.NET supported jagged arrays I could use byte[15][1024*1024 but it does not


gpu.CopyToDevice<byte>(inputPtr, 0, rawdata, offset, length) // length = 1024*1024
//offset is computed by  rowID*(1024*1024) where rowID wraps to 0 via modulo 15.
// inputPrt is the System.Inptr that points to the buffer in the circular queue (un-managed)?
// rawdata is a device buffer allocated gpu.Allocate<byte>(1024*1024);


public static void filter(GThread thread, byte[,] rawdata, int frameSize, byte[] result)


GPGPU.CopyToDevice(T) Method (IntPtr, Int32, T[,], Int32, Int32, Int32)


我尝试了以下代码,但我收到了CUDA.net异常: ErrorLaunchFailed


仅供参考:当我尝试使用CUDA模拟器时,它会在CopyToDevice上中止   声称数据不是主机分配

public static byte[] process(System.IntPtr data, int slot)
    Stopwatch watch = new Stopwatch();
    byte[] output = new byte[FrameSize];
    int offset = slot*FrameSize;
    byte[] rawdata = gpu.Cast<byte>(grawdata, FrameSize); // What is the size supposed to be? Documentation lacking
    gpu.CopyToDevice<byte>(data, 0, rawdata, offset, FrameSize * frameCount);
    byte[] goutput = gpu.Allocate<byte>(output);
    gpu.Launch(height, width).filter(rawdata, FrameSize, goutput);
    runTime = watch.Elapsed.ToString();
    gpu.CopyFromDevice(goutput, output);
    totalRunTime = watch.Elapsed.ToString();
    return output;

3 个答案:

答案 0 :(得分:1)

我现在提出这个&#34;解决方案&#34; : 1.仅在纯模式下运行程序(不在仿真模式下)。 2.不要自己处理固定内存分配。



答案 1 :(得分:1)

如果我理解你的问题,我认为你正在寻求转换 byte*您从循环缓冲区进入要发送到的多维byte数组 显卡API。

            int slots = 15;
            int rows = 1024;
            int columns = 1024;

//Try this
            for (int currentSlot = 0; currentSlot < slots; currentSlot++)
                IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);
                // use Marshal.Copy ?  
                byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory); 

                int offset =0;
                for (int m = 0; m < rows; m++)
                    for (int n = 0; n < columns; n++)
                        //then send this to your GPU method
                        rawForGpu[m, n] = ReadByteValue(IntPtr: intPtrToUnManagedMemory, 

//or try this
            for (int currentSlot = 0; currentSlot < slots; currentSlot++)
                IntPtr intPtrToUnManagedMemory = CopyContextFrom(currentSlot);

                // use Marshal.Copy ?
                byte[] byteData = CopyIntPtrToByteArray(intPtrToUnManagedMemory); 

                byte[,] rawForGpu = ConvertTo2DArray(byteData, rows, columns);

        private static byte[,] ConvertTo2DArray(byte[] byteArr, int rows, int columns)
            byte[,] data = new byte[rows, columns];
            int totalElements = rows * columns;
            //Convert 1D to 2D rows, colums
            return data;

        private static IntPtr CopyContextFrom(int slotNumber)
            //code that return byte* from circular buffer.
            return IntPtr.Zero;

答案 2 :(得分:0)

您应该考虑使用内置的 GPGPU Async 功能,以便以非常有效的方式将数据从/向主机/设备移动并使用gpuKern.LaunchAsync(...)



public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, DevicePtrEx devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[, ,] devArray,
                                 int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[,] devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;
public void CopyToDeviceAsync<T>(IntPtr hostArray, int hostOffset, T[] devArray,
                                  int devOffset, int count, int streamId = 0) where T : struct;