我在NUMA计算机上运行的程序中存在openMP性能问题。
该平台是一台8核机器。 (NUMA有2个节点,每个节点4个核心。)
我的程序剂量中有一个功能A: 1.开始时CPU掩码为0xFF(8个核心) 2.在中间,它使用掩码0xF0调用后台的子程序,并将当前pid设置为0x0F。该函数与子程序并行执行某些操作。 3.后台子程序完成后,它会加入并执行CPU_ZERO以重置绑定。
没有固定亲和力的所有openMP代码在函数A之后变慢。 然后我在功能A之后将其重置为0xFF并且它可以工作。经济放缓已经消失。 这是否意味着openMP的掩码0x0与0xFF不同?
使问题清楚。我在下面写了一个示例代码
原来你不能设置CPU。 是否意味着取消绑定CPU意味着将掩码设置为所有cpus?
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <sched.h>
#include <sys/sysinfo.h>
#include <stdio.h>
int main() {
cpu_set_t cpu_set;
int i, ret;
const int num_cores = get_nprocs_conf();
CPU_ZERO( &cpu_set );
ret = sched_getaffinity( 0, sizeof(cpu_set), &cpu_set );
printf(" default cpu_set: ");
for( i=0; i< num_cores; i++ ){
if( CPU_ISSET( i, &cpu_set )) printf( "%d ", i);
}
printf("\n");
CPU_ZERO( &cpu_set );
CPU_SET( 0, &cpu_set );
CPU_SET( 1, &cpu_set );
CPU_SET( 2, &cpu_set );
CPU_SET( 3, &cpu_set );
ret = sched_setaffinity( 0, sizeof(cpu_set), &cpu_set );
printf(" set cpu_set: 0 1 2 3 %s\n", ret==0? "success":"failed" );
printf(" new cpu_set: ");
ret = sched_getaffinity( 0, sizeof(cpu_set), &cpu_set );
for( i=0; i< num_cores; i++ ){
if( CPU_ISSET( i, &cpu_set )) printf( "%d ", i);
}
printf("\n");
CPU_ZERO( &cpu_set );
ret = sched_setaffinity( 0, sizeof(cpu_set), &cpu_set );
printf(" set cpu_set: none %s\n", ret==0? "success":"failed" );
printf(" new cpu_set: ");
ret = sched_getaffinity( 0, sizeof(cpu_set), &cpu_set );
for( i=0; i< num_cores; i++ ){
if( CPU_ISSET( i, &cpu_set )) printf( "%d ", i);
}
printf("\n");
CPU_ZERO( &cpu_set );
ret = sched_getaffinity( 0, sizeof(cpu_set), &cpu_set );
for( i=0; i< num_cores; i++ ){
if( CPU_SET( i, &cpu_set ));
}
ret = sched_setaffinity( 0, sizeof(cpu_set), &cpu_set );
printf(" set cpu_set: all %s\n", ret==0? "success":"failed" );
printf(" new cpu_set: ");
ret = sched_getaffinity( 0, sizeof(cpu_set), &cpu_set );
for( i=0; i< num_cores; i++ ){
if( CPU_ISSET( i, &cpu_set )) printf( "%d ", i);
}
printf("\n");
}
感谢您阅读我的问题。