我最近发现了Chudnovsky算法计算pi的实现:Parallel GMP-Chudnovsky using OpenMP with factorization
我使用默认的1核选项编译了1o ^ 3到10 ^ 8的各种数字。但是,我注意到随着核心数量的增加,计算结果所需的时间对于cpu和挂钟时间来说都需要更长的时间。为什么更多核心会增加计算所需的时间?它不应该加快计算速度并带来更好的性能吗?
这是一个示例输出:
~/Desktop$ ./pgmp-chudnovsky 7500000 0 1
#terms=528852, depth=21, cores=1
sieve cputime = 0.120
...................................................
bs cputime = 30.300 wallclock = 30.313
gcd cputime = 6.380
div cputime = 3.800
sqrt cputime = 2.140
mul cputime = 1.420
total cputime = 37.800 wallclock = 37.838
P size=10919784 digits (1.455971)
Q size=10919777 digits (1.455970)
~/Desktop$ ./pgmp-chudnovsky 7500000 0 2
#terms=528852, depth=21, cores=2
sieve cputime = 0.120
...................................................
bs cputime = 30.890 wallclock = 17.661
gcd cputime = 12.930
div cputime = 3.790
sqrt cputime = 2.130
mul cputime = 1.420
total cputime = 38.380 wallclock = 25.153
P size=10919611 digits (1.455948)
Q size=10919605 digits (1.455947)
~/Desktop$ ./pgmp-chudnovsky 7500000 0 3
#terms=528852, depth=21, cores=3
sieve cputime = 0.120
...................................................
bs cputime = 31.400 wallclock = 14.266
gcd cputime = 21.640
div cputime = 3.810
sqrt cputime = 2.130
mul cputime = 1.410
total cputime = 38.900 wallclock = 21.784
P size=10726889 digits (1.430252)
Q size=10726883 digits (1.430251)
~/Desktop$ ./pgmp-chudnovsky 7500000 0 4
#terms=528852, depth=21, cores=4
sieve cputime = 0.130
...................................................
bs cputime = 32.480 wallclock = 11.771
gcd cputime = 27.770
div cputime = 3.800
sqrt cputime = 2.130
mul cputime = 1.410
total cputime = 39.980 wallclock = 19.284
P size=10920859 digits (1.456115)
Q size=10920852 digits (1.456114)
~/Desktop$ ./pgmp-chudnovsky 7500000 0 5
#terms=528852, depth=21, cores=5
sieve cputime = 0.130
...................................................
bs cputime = 33.010 wallclock = 15.496
gcd cputime = 28.500
div cputime = 3.790
sqrt cputime = 2.130
mul cputime = 1.420
total cputime = 40.510 wallclock = 23.000
P size=10605102 digits (1.414014)
Q size=10605096 digits (1.414013)
~/Desktop$ ./pgmp-chudnovsky 7500000 0 10
#terms=528852, depth=21, cores=10
sieve cputime = 0.130
...................................................
bs cputime = 33.210 wallclock = 14.311
gcd cputime = 29.640
div cputime = 3.780
sqrt cputime = 2.140
mul cputime = 1.420
total cputime = 40.720 wallclock = 21.822
P size=10607304 digits (1.414307)
Q size=10607297 digits (1.414306)
~/Desktop$ ./pgmp-chudnovsky 7500000 0 100
#terms=528852, depth=21, cores=100
sieve cputime = 0.120
...................................................
bs cputime = 33.080 wallclock = 13.412
gcd cputime = 17.630
div cputime = 3.780
sqrt cputime = 2.130
mul cputime = 1.420
total cputime = 40.570 wallclock = 20.912
P size=12169347 digits (1.622580)
Q size=12169341 digits (1.622579)
~/Desktop$ ./pgmp-chudnovsky 7500000 0 200
#terms=528852, depth=21, cores=200
sieve cputime = 0.130
...................................................
bs cputime = 34.080 wallclock = 13.942
gcd cputime = 15.620
div cputime = 3.760
sqrt cputime = 2.110
mul cputime = 1.420
total cputime = 41.530 wallclock = 21.401
P size=12642316 digits (1.685642)
Q size=12642309 digits (1.685641)
答案 0 :(得分:2)
从结果看,你有一个4核系统。在此之后增加使用的线程数将损害性能,因为您获得了线程上下文切换的开销,而无需再进行任何同步工作。
Cores Total Time
1 37.838
2 25.153
3 21.784
4 19.284 *Best*
5 23.000
10 21.822
100 20.912