因此,我编写了一个程序,该程序使用20次试用运行来计算另一个程序(从命令行参数获取名称)的平均运行时间。我使用pthread使程序运行更快(并行运行多个跟踪)。但是,尽管单线程版本运行良好,但多线程版本却给了我非常奇怪的结果。这是代码:
main.cpp
#include <iostream>
#include <chrono>
#include <pthread.h>
using namespace std;
using namespace std::chrono;
struct Thread_arg{ // struct to hold the arguments
int a; // index of which element for runtime
char *b; // name of the program to run
};
double runtime[20]; // store all 20 runtime
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
void* run_program(void *input){
auto *arg = (Thread_arg *) input;
high_resolution_clock::time_point t1 = high_resolution_clock::now();
system(arg->b);
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = duration_cast<microseconds>( t2 - t1 ).count();
runtime[arg->a] = duration;
pthread_mutex_lock(&mutex1);
cout << " Runtime: " << duration << "\u00B5s" << endl;
pthread_mutex_unlock(&mutex1);
return nullptr;
}
int main(int argc, char** argv){
if(argc != 2){
if(argc > 2){
cout << "Too many arguments, you only need 1" << endl;
}else if(argc < 2){
cout << "You need 1 argument" << endl;
}
exit(410);
}
// format name of the program
string program = "./";
program += argv[1];
int n = (int) program.length();
char char_array[n+1];
strcpy(char_array, program.c_str());
// start testing, get average of 20 test
pthread_t threads[20]; // create threads
for(int i = 0; i < 20; i++){
struct Thread_arg *arg = (struct Thread_arg *)malloc(sizeof(struct Thread_arg)); // arguments in struct
arg->b = char_array; // load struct
arg->a = i;
pthread_create(&threads[i], nullptr, run_program, (void *)arg); // launch thread
}
// join threads
for (auto &thread : threads){
pthread_join(thread, nullptr);
}
// calculate average
double total_time = 0;
for (double &i : runtime){
total_time += i;
}
double runtime_ave = total_time/20;
cout << "************************************************************" << endl;
cout <<"||| " << "Average runtime for program " << program << " is: " << runtime_ave << "\u00B5s" << " |||" << endl;
cout << "************************************************************" << endl;
}
符合g ++ -o运行时main.cpp
test.cpp
#include <unistd.h>
#include <iostream>
int main(){
usleep(1000000);
std::cout << "Demo Finished";
return 0;
}
符合g ++ -o test test.cpp
通过./runtime测试运行
正确的结果应该是:
Demo Finished Runtime: 1.01159e+06µs
Demo Finished Runtime: 1.01040e+06µs
Demo Finished Runtime: 1.01208e+06µs
Demo Finished Runtime: 1.00862e+06µs
Demo Finished Runtime: 1.0065e+06µs
Demo Finished Runtime: 1.00863e+06µs
Demo Finished Runtime: 1.01288e+06µs
Demo Finished Runtime: 1.01039e+06µs
Demo Finished Runtime: 1.01221e+06µs
Demo Finished Runtime: 1.00687e+06µs
Demo Finished Runtime: 1.01136e+06µs
Demo Finished Runtime: 1.00874e+06µs
Demo Finished Runtime: 1.03106e+06µs
Demo Finished Runtime: 1.00714e+06µs
Demo Finished Runtime: 1.00679e+06µs
Demo Finished Runtime: 1.01873e+06µs
Demo Finished Runtime: 1.01086e+06µs
Demo Finished Runtime: 1.01146e+06µs
Demo Finished Runtime: 1.01179e+06µs
Demo Finished Runtime: 1.00995e+06µs
************************************************************
||| Average runtime for program ./demo is: 1.0114e+06µs |||
************************************************************
但是,我得到了:
Demo Finished Runtime: 1011435µs
Demo Finished Runtime: 2024995µs
Demo Finished Runtime: 3033430µs
Demo Finished Runtime: 4039988µs
Demo Finished Runtime: 5046514µs
Demo Finished Runtime: 6059725µs
Demo Finished Runtime: 7071353µs
Demo Finished Runtime: 8081074µs
Demo Finished Runtime: 9088289µs
Demo Finished Runtime: 10099950µs
Demo Finished Runtime: 11108043µs
Demo Finished Runtime: 12126147µs
Demo Finished Runtime: 13134197µs
Demo Finished Runtime: 14151540µs
Demo Finished Runtime: 15161500µs
Demo Finished Runtime: 16173660µs
Demo Finished Runtime: 17186823µs
Demo Finished Runtime: 18194055µs
Demo Finished Runtime: 19206132µs
Demo Finished Runtime: 20217351µs
************************************************************
||| Average runtime for program ./test is: 1.06108e+07µs |||
************************************************************
如您所见,时间在增加。知道为什么吗?我不认为这是互斥锁,因为常见的访问变量是数组,而且它们都不会访问同一变量。
谢谢。
更新:
在ubuntu VM上尝试后,我用相同的代码得到了正确的结果。但是,我仍然不完全理解为什么Trail运行时结果在MacOS上变得复杂。我认为这可能与system()
或在MacOS上g ++实际上是clang ++有关。我将做更多的实验来找出问题所在。但是,如果您知道发生了什么,请给我确切的答案,我们将不胜感激。
非常感谢@William Miller和@alk在评论部分提供了帮助。
答案 0 :(得分:3)
我可以在Mac上对其进行复制。原因是system()
如果已经在另一个线程中运行则阻塞。它在内部锁定互斥锁。
所有线程都可以执行now()
,但是只有一个程序可以执行。因此,该程序的运行时间为1秒。然后运行第二个,但其时间包括第一个的时间,依此类推。
您可以通过添加以下内容进行确认:
20 cout << "before system()" << endl;
21 system(arg->b);
22 cout << "after system()" << endl;
因此,最简单的答案是:因为编译器/系统随附的系统库具有互斥量,以使system()具有线程安全性。
您可以在system()
内的等待线程中看到等待该互斥锁的信息:
* thread #4
* frame #0: 0x00007fff6c8eda46 libsystem_kernel.dylib`__psynch_mutexwait + 10
frame #1: 0x00007fff6cab5b9d libsystem_pthread.dylib`_pthread_mutex_lock_wait + 83
frame #2: 0x00007fff6cab34c8 libsystem_pthread.dylib`_pthread_mutex_lock_slow + 253
frame #3: 0x00007fff6c8688d1 libsystem_c.dylib`system + 183
frame #4: 0x0000000100000e11 runtime`run_program(input=0x00000001003000a0) at main.cpp:21
frame #5: 0x00007fff6cab5661 libsystem_pthread.dylib`_pthread_body + 340
frame #6: 0x00007fff6cab550d libsystem_pthread.dylib`_pthread_start + 377
frame #7: 0x00007fff6cab4bf9 libsystem_pthread.dylib`thread_start + 13
正在运行的程序等待正在运行的程序完成:
* thread #2
* frame #0: 0x00007fff6c8ee242 libsystem_kernel.dylib`__wait4_nocancel + 10
frame #1: 0x00007fff6c8689de libsystem_c.dylib`system + 452
frame #2: 0x0000000100000e11 runtime`run_program(input=0x0000000100300080) at main.cpp:21
frame #3: 0x00007fff6cab5661 libsystem_pthread.dylib`_pthread_body + 340
frame #4: 0x00007fff6cab550d libsystem_pthread.dylib`_pthread_start + 377
frame #5: 0x00007fff6cab4bf9 libsystem_pthread.dylib`thread_start + 13