我在我的8000系列设备(支持CUDA)上运行以下代码:
#include <stdio.h>
__global__ void testSet(int * MyBlock)
{
unsigned int ThreadIDX= threadIdx.x+blockDim.x*blockIdx.x;
MyBlock[ThreadIDX]=ThreadIDX;
}
int main()
{
int * MyInts;
int Result[1024];
cudaMalloc( (void**) &MyInts,sizeof(int)*1024);
testSet<<<2,512>>>(MyInts);
cudaMemcpy(Result,MyInts,sizeof(int)*1024,cudaMemcpyDeviceToHost);
for(unsigned int t=0; t<1024/8;t++) {
printf("Results: %d %d %d %d %d %d %d %d\n",
Result[t], Result[t+1],Result[t+2],
Result[t+3],Result[t+4],Result[t+5],
Result[t+6],Result[t+7]);
}
return 0;
}
我出去了......
Results: 0 1 2 3 4 5 6 7
Results: 1 2 3 4 5 6 7 8
Results: 2 3 4 5 6 7 8 9
Results: 3 4 5 6 7 8 9 10
Results: 4 5 6 7 8 9 10 11
Results: 5 6 7 8 9 10 11 12
Results: 6 7 8 9 10 11 12 13
Results: 7 8 9 10 11 12 13 14
Results: 8 9 10 11 12 13 14 15
Results: 9 10 11 12 13 14 15 16
Results: 10 11 12 13 14 15 16 17
Results: 11 12 13 14 15 16 17 18
Results: 12 13 14 15 16 17 18 19
Results: 13 14 15 16 17 18 19 20
Results: 14 15 16 17 18 19 20 21
Results: 15 16 17 18 19 20 21 22
Results: 16 17 18 19 20 21 22 23
Results: 17 18 19 20 21 22 23 24
Results: 18 19 20 21 22 23 24 25
Results: 19 20 21 22 23 24 25 26
Results: 20 21 22 23 24 25 26 27
Results: 21 22 23 24 25 26 27 28
Results: 22 23 24 25 26 27 28 29
Results: 23 24 25 26 27 28 29 30
Results: 24 25 26 27 28 29 30 31
Results: 25 26 27 28 29 30 31 32
Results: 26 27 28 29 30 31 32 33
Results: 27 28 29 30 31 32 33 34
Results: 28 29 30 31 32 33 34 35
Results: 29 30 31 32 33 34 35 36
Results: 30 31 32 33 34 35 36 37
Results: 31 32 33 34 35 36 37 38
Results: 32 33 34 35 36 37 38 39
Results: 33 34 35 36 37 38 39 40
Results: 34 35 36 37 38 39 40 41
Results: 35 36 37 38 39 40 41 42
Results: 36 37 38 39 40 41 42 43
Results: 37 38 39 40 41 42 43 44
Results: 38 39 40 41 42 43 44 45
Results: 39 40 41 42 43 44 45 46
Results: 40 41 42 43 44 45 46 47
Results: 41 42 43 44 45 46 47 48
Results: 42 43 44 45 46 47 48 49
Results: 43 44 45 46 47 48 49 50
Results: 44 45 46 47 48 49 50 51
Results: 45 46 47 48 49 50 51 52
Results: 46 47 48 49 50 51 52 53
Results: 47 48 49 50 51 52 53 54
Results: 48 49 50 51 52 53 54 55
Results: 49 50 51 52 53 54 55 56
Results: 50 51 52 53 54 55 56 57
Results: 51 52 53 54 55 56 57 58
Results: 52 53 54 55 56 57 58 59
Results: 53 54 55 56 57 58 59 60
Results: 54 55 56 57 58 59 60 61
Results: 55 56 57 58 59 60 61 62
Results: 56 57 58 59 60 61 62 63
Results: 57 58 59 60 61 62 63 64
Results: 58 59 60 61 62 63 64 65
Results: 59 60 61 62 63 64 65 66
Results: 60 61 62 63 64 65 66 67
Results: 61 62 63 64 65 66 67 68
Results: 62 63 64 65 66 67 68 69
Results: 63 64 65 66 67 68 69 70
Results: 64 65 66 67 68 69 70 71
Results: 65 66 67 68 69 70 71 72
Results: 66 67 68 69 70 71 72 73
Results: 67 68 69 70 71 72 73 74
Results: 68 69 70 71 72 73 74 75
Results: 69 70 71 72 73 74 75 76
Results: 70 71 72 73 74 75 76 77
Results: 71 72 73 74 75 76 77 78
Results: 72 73 74 75 76 77 78 79
Results: 73 74 75 76 77 78 79 80
Results: 74 75 76 77 78 79 80 81
Results: 75 76 77 78 79 80 81 82
Results: 76 77 78 79 80 81 82 83
Results: 77 78 79 80 81 82 83 84
Results: 78 79 80 81 82 83 84 85
Results: 79 80 81 82 83 84 85 86
Results: 80 81 82 83 84 85 86 87
Results: 81 82 83 84 85 86 87 88
Results: 82 83 84 85 86 87 88 89
Results: 83 84 85 86 87 88 89 90
Results: 84 85 86 87 88 89 90 91
Results: 85 86 87 88 89 90 91 92
Results: 86 87 88 89 90 91 92 93
Results: 87 88 89 90 91 92 93 94
Results: 88 89 90 91 92 93 94 95
Results: 89 90 91 92 93 94 95 96
Results: 90 91 92 93 94 95 96 97
Results: 91 92 93 94 95 96 97 98
Results: 92 93 94 95 96 97 98 99
Results: 93 94 95 96 97 98 99 100
Results: 94 95 96 97 98 99 100 101
Results: 95 96 97 98 99 100 101 102
Results: 96 97 98 99 100 101 102 103
Results: 97 98 99 100 101 102 103 104
Results: 98 99 100 101 102 103 104 105
Results: 99 100 101 102 103 104 105 106
Results: 100 101 102 103 104 105 106 107
Results: 101 102 103 104 105 106 107 108
Results: 102 103 104 105 106 107 108 109
Results: 103 104 105 106 107 108 109 110
Results: 104 105 106 107 108 109 110 111
Results: 105 106 107 108 109 110 111 112
Results: 106 107 108 109 110 111 112 113
Results: 107 108 109 110 111 112 113 114
Results: 108 109 110 111 112 113 114 115
Results: 109 110 111 112 113 114 115 116
Results: 110 111 112 113 114 115 116 117
Results: 111 112 113 114 115 116 117 118
Results: 112 113 114 115 116 117 118 119
Results: 113 114 115 116 117 118 119 120
Results: 114 115 116 117 118 119 120 121
Results: 115 116 117 118 119 120 121 122
Results: 116 117 118 119 120 121 122 123
Results: 117 118 119 120 121 122 123 124
Results: 118 119 120 121 122 123 124 125
Results: 119 120 121 122 123 124 125 126
Results: 120 121 122 123 124 125 126 127
Results: 121 122 123 124 125 126 127 128
Results: 122 123 124 125 126 127 128 129
Results: 123 124 125 126 127 128 129 130
Results: 124 125 126 127 128 129 130 131
Results: 125 126 127 128 129 130 131 132
Results: 126 127 128 129 130 131 132 133
Results: 127 128 129 130 131 132 133 134
我不希望打印0..1024吗?
我误解了什么吗?我阅读了NVIDIA CUDA编程指南的介绍部分,我认为这是事情的运作方式。
当然,到目前为止,我遇到了很多恼人的错误/设计限制(例如8000系列缺乏双精度支持)以及CUDA使用iomanip命令(setw,setprecision)导致的错误(如果你使用“std”) :: ...“而不是一般”使用命名空间std;“
所以我想我期待一些尴尬......
但我只是想要弄清楚这里到底发生了什么......
答案 0 :(得分:4)
变化:
for(unsigned int t=0; t<1024/8;t++) {
为:
for(unsigned int t=0; t<1024; t+=8) {
你有2 x 512 = 1024个线程,其索引范围为0..1023。每个线程都将自己的索引写入MyBlock中的相应位置。因此,您希望看到一个数值等于其索引的数组。