我安装了gcc-7
,gcc-8
,gcc-7-offload-nvptx
和gcc-8-offload-nvptx
我尝试用两者编译一个简单的OpenMP代码并卸载:
#include <omp.h>
#include <stdio.h>
int main(){
#pragma omp target
#pragma omp teams distribute parallel for
for (int i=0; i<omp_get_num_threads(); i++)
printf("%d in %d of %d\n",i,omp_get_thread_num(), omp_get_num_threads());
}
使用以下行(也gcc-7
):
gcc-8 code.c -fopenmp -foffload=nvptx-none
但它没有编译,给出以下错误:
/tmp/ccKESWcF.o: In function "main":
teste.c:(.text+0x50): undefined reference to "GOMP_target_ext"
/tmp/cc0iOH1Y.target.o: In function "init":
ccPXyu6Y.c:(.text+0x1d): undefined reference to "GOMP_offload_register_ver"
/tmp/cc0iOH1Y.target.o: In function "fini":
ccPXyu6Y.c:(.text+0x41): undefined reference to "GOMP_offload_unregister_ver"
collect2: error: ld returned 1 exit status
一些线索?
答案 0 :(得分:0)
您使用-foffload=disable -fno-stack-protector
和gcc7
以及Ubuntu 17.10,使用gcc-7-offload-nvptx
编码并运行代码。
但是在GPU上(没有-foffload=disable
)它无法编译。您无法从GPU调用printf
。相反,你可以这样做:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
int nthreads;
#pragma omp target teams map(tofrom:nthreads)
#pragma omp parallel
#pragma omp single
nthreads = omp_get_num_threads();
int *ithreads = malloc(sizeof *ithreads *nthreads);
#pragma omp target teams distribute parallel for map(tofrom:ithreads[0:nthreads])
for (int i=0; i<nthreads; i++) ithreads[i] = omp_get_thread_num();
for (int i=0; i<nthreads; i++)
printf("%d in %d of %d\n", i, ithreads[i], nthreads);
free(ithreads);
}
对我来说这是输出
0 in 0 of 8
1 in 0 of 8
2 in 0 of 8
3 in 0 of 8
4 in 0 of 8
5 in 0 of 8
6 in 0 of 8
7 in 0 of 8