Question

我写了一个python C扩展。它的工作正常。但是为了更高效的执行，我需要编写同一扩展的多线程/并行执行版本。

请告诉我，如何编写同时在多个内核上运行的Python C-Extension代码。

我在这里打了超过一天。请帮忙。

Answer 1

也许为时已晚，但希望能帮助其他人：）

并行执行C扩展的最简单方法是使用OPENMP API。来自wikipedia：

OpenMP（Open Multi-Processing）是一个应用程序编程接口（API）支持多平台共享内存多处理在C，C ++和Fortran上编程，在大多数平台上，处理器架构和操作系统。

例如，请参阅此部分代码：

main.js

结果：

int i;
for (i=0;i<10;i++)
{
    printf("%d ",i);
}

我们可以在0 1 2 3 4 5 6 7 8 9语句块之前使用#pragma omp parallel for编译器指令使其并行：

for

结果：

int i;
#pragma omp parallel for
for (i=0;i<10;i++)
{
    printf("%d ",i);
}

要在gcc中启用openmp，您需要指定0 1 5 8 9 2 6 4 3 7编译时标志。例如：

-fopenmp

你可以从HERE倾斜openmp。

其他方式如pthread，但它的级别非常低。

OpenMP与PThread： 用C ++编写的HERE示例。

序列C ++代码：

gcc -fPIC -Wall -O3 costFunction.c -o costFunction.so -shared -fopenmp

pthread解决方案：

void sum_st(int *A, int *B, int *C){
   int end = 10000000;
   for(int i = 0; i < end; i++)
    A[i] = B[i] + C[i];
}

OpenMP解决方案：

 struct params {
  int *A;
  int *B;
  int *C;
  int tid;
  int size;
  int nthreads;
};

void *compute_parallel(void *_p){
  params *p      = (params*) _p;
  int tid        = p->tid;
  int chunk_size = (p->size / p->nthreads);
  int start      = tid * chunk_size;
  int end        = start + chunk_size;
  for(int i = start; i < end; i++)     p->A[i] = p->B[i] + p->C[i];
  return 0;
}

void sum_mt(int *A, int *B, int *C){
  int nthreads = 4;
  int size = 10000000;
  pthread_t threads[nthreads]; //array to hold thread information
  params *thread_params = (params*) malloc(nthreads * sizeof(params));

  for(int i = 0; i < nthreads; i++){
    thread_params[i].A        = A;
    thread_params[i].B        = B;
    thread_params[i].C        = C;
    thread_params[i].tid      = i;
    thread_params[i].size     = size;
    thread_params[i].nthreads = nthreads;
    pthread_create(&threads[i], NULL, compute_parallel, (void*) &thread_params[i]);
  }

  for(int i = 0; i < nthreads; i++){
    pthread_join(threads[i], NULL);
  }
  free(thread_params);

}

实现真正的并行性python C-Extension

1 个答案: