我已将KMP_AFFINITY设置为分散,但执行时间增加了很多!
这就是为什么我认为OpenMP只在1核上产生线程的原因。
所以我需要一些东西 返回一个线程正在使用的核心。
这是我在for循环之前使用的pragma:
int procs = omp_get_num_procs();
#pragma omp parallel for num_threads(procs)\
shared (c, u, v, w, k, j, i, nx, ny) \
reduction(+: a, b, c, d, e, f, g, h, i)
这些是我做的出口:
export OMP_NUM_THREADS=5
export KMP_AFFINITY=verbose,scatter
如果它有助于我粘贴详细信息:
OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid instr info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #159: KMP_AFFINITY: 2 packages x 4 cores/pkg x 1 threads/core (8 total cores)
OMP: Info #160: KMP_AFFINITY: OS proc to physical thread map ([] => level not in map):
OMP: Info #168: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 4 maps to package 0 core 1 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 1 maps to package 1 core 0 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 5 maps to package 1 core 1 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 3 maps to package 1 core 2 [thread 0]
OMP: Info #168: KMP_AFFINITY: OS proc 7 maps to package 1 core 3 [thread 0]
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {5}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {3}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {6}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {7}
提前感谢!
答案 0 :(得分:5)
作为@ user3018144 pointed out,sched_getcpu(3)
可用于获取CPU编号。
请考虑以下代码:
#define _GNU_SOURCE // sched_getcpu(3) is glibc-specific (see the man page)
#include <stdio.h>
#include <sched.h>
#include <omp.h>
int main() {
#pragma omp parallel
{
int thread_num = omp_get_thread_num();
int cpu_num = sched_getcpu();
printf("Thread %3d is running on CPU %3d\n", thread_num, cpu_num);
}
return 0;
}
这是我没有亲和力的输出:
$> OMP_NUM_THREADS=4 ./a.out | sort
Thread 0 is running on CPU 2
Thread 1 is running on CPU 0
Thread 2 is running on CPU 3
Thread 3 is running on CPU 1
这是具有亲和力的输出:
$> GOMP_CPU_AFFINITY='0,1,2,3' OMP_NUM_THREADS=4 ./a.out | sort
Thread 0 is running on CPU 0
Thread 1 is running on CPU 1
Thread 2 is running on CPU 2
Thread 3 is running on CPU 3
答案 1 :(得分:1)
如果您使用的是Linux,则可以使用函数sched_getcpu()
。这是一个解释它是如何工作的链接及其声明:
http://man7.org/linux/man-pages/man3/sched_getcpu.3.html
希望这可以提供帮助