我写了一个小代码,递归地生成图形,然后释放节点(如下所示)。我正在使用带有和不带-fopenmp标志的mpicc编译代码。
随着时间运行代码(时间mpirun ./a.out)我得到4.259秒的实时时间,但是使用openmp指令(在代码中显示)我受到了巨大的惩罚 - 实时现在是10.744秒。我一直在尝试一些设置来设置线程数等,但我没有加速。
我可以设置更多可以实现更多并行性和性能提升的东西吗?或者代码是如此之小,并且在父节点和后代节点之间存在依赖关系阻止任何潜在的收益?
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <omp.h>
struct tree_node{
int value;
struct tree_node *left_node, *right_node;
};
typedef struct tree_node tree_node_t;
tree_node_t* new_node(int value){
tree_node_t *p = (tree_node_t*) malloc(sizeof(tree_node_t));
p->value = value;
p->left_node = p->right_node = NULL;
assert((p->value > 0 || p->value == 0) && p->value < 10);
return p;
}
void tree_grow(tree_node_t* parent, int depth){
if(depth == 0){
return;
}
#pragma omp task
{
#pragma omp task
{parent->left_node = new_node(rand() % 10);}
#pragma omp task
{parent->right_node = new_node(rand() % 10);}
#pragma omp task
{tree_grow(parent->left_node, depth-1);}
#pragma omp task
{tree_grow(parent->right_node, depth-1);}
}
}
void branch_cutting(tree_node_t* current){
if (current == NULL){
return;
}
#pragma omp task
{
#pragma omp task
{branch_cutting(current->left_node);}
#pragma omp task
{branch_cutting(current->right_node);}
}
#pragma omp task
free(current);
}
int main (int argc, char **argv)
{
//omp_set_dynamic(0);
//omp_set_num_threads(6);
tree_node_t *root = new_node(0);
tree_grow(root, 25);
printf("tree grown and now freeing the memory...");
//#pragma omp taskwait
branch_cutting(root);
printf("tree is removed\n");
return 0;
}