我有一个运行循环的c程序并计时它的运行时间。要查看加速,我使用omp_set_num_threads
更改主体中的线程数,然后调用函数。
int main(){
...
for(int num_threads = 1; num_threads<=16; num_threads *= 2)
{
omp_set_num_threads(num_threads);
pFilterFirst ( ..., num_threads );
checkData ( serial_array, output_array );
memset ( output_array, 0, DATA_LEN );
pDataFirst ( ..., num_threads );
checkData ( serial_array, output_array );
memset ( output_array, 0, DATA_LEN );
}
}
int pDataFirst(..., num_threads) {
/* Variables for timing */
struct timeval ta, tb, tresult;
/* get initial time */
gettimeofday ( &ta, NULL );
#pragma omp parallel for
/* for all elements in the data */
for (int x=0; x<data_len; x++) {
/* for all elements in the filter */
for (int y=0; y<filter_len; y++) {
/* it the data element matches the filter */
if (input_array[x] == filter_list[y]) {
/* include it in the output */
output_array[x] = input_array[x];
}
}
}
/* get initial time */
gettimeofday ( &tb, NULL );
timeval_subtract ( &tresult, &tb, &ta );
printf ("Parallel data first took %lu seconds and %lu microseconds. Filter l\
ength = %d. Threads = %d\n", tresult.tv_sec, tresult.tv_usec, filter_len, num_t\
hreads );
}
当我将运行时视为线程数的函数时,运行时随着线程数的增加而增加,这没有任何意义。我在亚马逊ec2实例上运行它。
因此,当我在大型ec2实例上运行模拟时,问题似乎已得到解决。我获得了预期的加速。