Question

我试图生成一组随机数，只有1和0。下面的代码几乎可以工作。当我执行print for循环时，我注意到有时候我生成的数字不是1或0.我知道我错过了一些不确定的东西。我认为这是一种记忆错位。

#include <stdio.h>
#include <curand.h>
#include <curand_kernel.h>
#include <math.h>
#include <assert.h>
#define MIN 1
#define MAX (2048*20)

#define MOD 2 // only need one and zero for each random value.
#define THREADS_PER_BLOCK 256

__global__ void setup_kernel(curandState *state, unsigned long seed)
{
  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  curand_init(seed, idx, 0, state+idx);
}

__global__ void generate_kernel(curandState *state,  unsigned int *result){

  int idx = threadIdx.x + blockDim.x*blockIdx.x;
   result[idx] = curand(state+idx) % MOD;
}

int main(){

  curandState *d_state;
  cudaMalloc(&d_state, sizeof(curandState));

  unsigned *d_result, *h_result;
  cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned));
  h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned));

  cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned));

  setup_kernel<<<MAX/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_state,time(NULL));

  generate_kernel<<<MAX/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>(d_state, d_result);

  cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost);  

  printf("Bin:    Count: \n");
  for (int i = MIN; i <= MAX; i++)
    printf("%d    %d\n", i, h_result[i-MIN]);

  free(h_result);
  cudaFree(d_result);

  system("pause");
  return 0;
}

我试图做的是从这个网站转换遗传算法。

http://www.ai-junkie.com/ga/intro/gat3.html

我认为学习CUDA并同时享受一些乐趣是一个很好的问题。

第一部分是生成我的随机数组。

Answer 1

这里的问题是，由于超出内存访问权限，setup_kernel和generate_kernel都没有运行完成。两个内核都期望每个线程都有一个生成器状态，但是你只在设备上分配一个状态。这导致两个内核上的内存读取和写入超出范围。改变这个：

curandState *d_state;
cudaMalloc(&d_state, sizeof(curandState));

类似

curandState *d_state;
cudaMalloc(&d_state, sizeof(curandState) * (MAX-MIN+1));

这样你运行的每个线程都有一个生成器状态，事情应该开始工作了。如果您尝试从运行时API返回状态或使用cuda-memcheck检查错误，则错误的来源将立即显现。

cuda随机数并不总是返回0和1

1 个答案: