Hello目前我正在使用Infiniband并使用IMB-benchmark测试性能,我目前正在测试并行传输测试 并且想知道结果确实反映了8个过程的并行性能。
对结果的解释太模糊了,让我无法理解。 由于在每个结果中都提到了(在MPI_Barrier中等待了6个额外的进程),我怀疑它每个只运行2个进程?
吞吐量列t_avg [usec]结果似乎得到了正确的结果,但我需要确保我正确理解。
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
这段话是否意味着我正在并行运行8个进程?
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
这段经文意味着4个进程并行运行? 感谢
非常感谢熟悉IMB基准测试的人的帮助以下是
的完整结果# np - 8
#------------------------------------------------------------
# Intel (R) MPI Benchmarks 2018, MPI-1 part
#------------------------------------------------------------
# Date : Mon Oct 16 14:14:20 2017
# Machine : x86_64
# System : Linux
# Release : 4.4.0-96-generic
# Version : #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017
# MPI Version : 3.0
# MPI Thread Environment:
# Calling sequence was:
# ./IMB-MPI1 Sendrecv Exchange
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# Sendrecv
# Exchange
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 13.85 13.85 13.85 0.00
1 1000 12.22 12.22 12.22 0.16
2 1000 10.08 10.08 10.08 0.40
4 1000 9.43 9.43 9.43 0.85
8 1000 8.89 8.91 8.90 1.80
16 1000 8.70 8.71 8.71 3.67
32 1000 9.00 9.00 9.00 7.11
64 1000 8.82 8.82 8.82 14.51
128 1000 8.90 8.90 8.90 28.77
256 1000 8.98 8.98 8.98 56.99
512 1000 9.78 9.78 9.78 104.75
1024 1000 12.65 12.65 12.65 161.91
2048 1000 18.31 18.32 18.31 223.63
4096 1000 20.05 20.05 20.05 408.52
8192 1000 21.15 21.16 21.16 774.11
16384 1000 27.46 27.47 27.46 1193.05
32768 1000 36.93 36.94 36.93 1774.31
65536 640 60.56 60.59 60.57 2163.39
131072 320 117.62 117.63 117.63 2228.57
262144 160 202.67 202.68 202.67 2586.78
524288 80 323.86 324.28 324.07 3233.56
1048576 40 615.05 615.47 615.26 3407.42
2097152 20 1214.74 1216.89 1215.82 3446.74
4194304 10 2471.83 2488.45 2480.14 3371.02
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 11.14 11.15 11.15 0.00
1 1000 11.16 11.16 11.16 0.18
2 1000 11.11 11.12 11.12 0.36
4 1000 11.10 11.11 11.10 0.72
8 1000 11.03 11.04 11.03 1.45
16 1000 11.21 11.22 11.22 2.85
32 1000 11.81 11.81 11.81 5.42
64 1000 11.58 11.58 11.58 11.05
128 1000 11.77 11.78 11.78 21.72
256 1000 11.88 11.89 11.89 43.05
512 1000 13.03 13.03 13.03 78.57
1024 1000 14.73 14.74 14.74 138.92
2048 1000 19.37 19.39 19.38 211.24
4096 1000 21.31 21.34 21.33 383.96
8192 1000 26.19 26.22 26.20 624.84
16384 1000 32.65 32.69 32.67 1002.26
32768 1000 48.71 48.78 48.75 1343.52
65536 640 75.14 75.22 75.18 1742.63
131072 320 174.66 175.15 174.94 1496.65
262144 160 301.22 302.02 301.44 1735.95
524288 80 539.40 542.68 540.78 1932.21
1048576 40 1015.45 1026.34 1020.59 2043.32
2097152 20 1959.53 1985.57 1971.34 2112.39
4194304 10 3549.00 3641.61 3590.76 2303.55
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 12.81 12.83 12.82 0.00
1 1000 12.82 12.84 12.83 0.16
2 1000 12.73 12.75 12.74 0.31
4 1000 12.82 12.85 12.84 0.62
8 1000 12.87 12.88 12.87 1.24
16 1000 12.83 12.86 12.84 2.49
32 1000 13.25 13.28 13.26 4.82
64 1000 13.44 13.46 13.45 9.51
128 1000 13.49 13.51 13.50 18.94
256 1000 13.72 13.74 13.73 37.27
512 1000 13.69 13.71 13.70 74.72
1024 1000 15.73 15.75 15.74 130.07
2048 1000 20.72 20.76 20.74 197.28
4096 1000 22.68 22.74 22.72 360.28
8192 1000 29.48 29.52 29.50 555.04
16384 1000 39.89 39.95 39.92 820.31
32768 1000 57.38 57.48 57.43 1140.24
65536 640 95.23 95.34 95.29 1374.78
131072 320 214.61 215.16 214.83 1218.38
262144 160 365.75 368.39 367.28 1423.18
524288 80 679.82 687.10 683.13 1526.08
1048576 40 1277.18 1309.22 1295.65 1601.83
2097152 20 2292.99 2377.56 2339.35 1764.12
4194304 10 4617.95 4919.67 4778.37 1705.12
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 12.41 12.42 12.42 0.00
1 1000 12.47 12.48 12.47 0.32
2 1000 11.93 11.94 11.94 0.67
4 1000 11.95 11.96 11.95 1.34
8 1000 11.91 11.92 11.92 2.69
16 1000 11.97 11.98 11.97 5.34
32 1000 12.80 12.81 12.80 10.00
64 1000 12.84 12.84 12.84 19.93
128 1000 12.90 12.91 12.91 39.67
256 1000 12.90 12.91 12.91 79.34
512 1000 14.04 14.04 14.04 145.82
1024 1000 17.13 17.14 17.13 239.02
2048 1000 21.06 21.06 21.06 389.05
4096 1000 23.32 23.33 23.32 702.41
8192 1000 28.07 28.07 28.07 1167.45
16384 1000 37.81 37.82 37.82 1732.64
32768 1000 55.23 55.24 55.24 2372.75
65536 640 101.04 101.06 101.05 2593.84
131072 320 212.88 212.88 212.88 2462.84
262144 160 362.37 362.38 362.37 2893.62
524288 80 668.88 668.89 668.88 3135.26
1048576 40 1286.48 1287.81 1287.15 3256.92
2097152 20 2463.56 2464.13 2463.84 3404.29
4194304 10 4845.24 4854.75 4849.99 3455.83
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 16.46 16.46 16.46 0.00
1 1000 16.42 16.43 16.42 0.24
2 1000 16.17 16.17 16.17 0.49
4 1000 16.17 16.17 16.17 0.99
8 1000 16.19 16.20 16.20 1.98
16 1000 16.21 16.22 16.22 3.94
32 1000 17.20 17.21 17.20 7.44
64 1000 17.09 17.10 17.10 14.97
128 1000 17.24 17.25 17.25 29.68
256 1000 17.40 17.41 17.40 58.83
512 1000 17.59 17.61 17.60 116.32
1024 1000 21.43 21.45 21.44 190.95
2048 1000 29.49 29.50 29.49 277.71
4096 1000 31.63 31.66 31.64 517.58
8192 1000 36.70 36.72 36.71 892.41
16384 1000 49.50 49.53 49.52 1323.07
32768 1000 68.35 68.36 68.36 1917.38
65536 640 108.80 108.85 108.82 2408.31
131072 320 314.38 314.72 314.56 1665.91
262144 160 521.71 522.24 521.94 2007.84
524288 80 930.03 933.47 931.82 2246.62
1048576 40 1729.81 1738.30 1734.66 2412.87
2097152 20 3384.33 3414.99 3403.61 2456.41
4194304 10 6972.50 7058.12 7028.16 2377.01
#-----------------------------------------------------------------------------
# Benchmarking Exchange
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 18.91 18.93 18.92 0.00
1 1000 19.06 19.08 19.07 0.21
2 1000 18.91 18.92 18.92 0.42
4 1000 19.07 19.09 19.08 0.84
8 1000 18.81 18.83 18.82 1.70
16 1000 19.02 19.03 19.03 3.36
32 1000 19.85 19.85 19.85 6.45
64 1000 19.76 19.78 19.77 12.94
128 1000 19.94 19.96 19.95 25.65
256 1000 20.16 20.18 20.17 50.75
512 1000 20.50 20.51 20.50 99.86
1024 1000 24.52 24.55 24.54 166.83
2048 1000 36.35 36.39 36.37 225.14
4096 1000 38.77 38.81 38.79 422.20
8192 1000 44.79 44.82 44.81 731.12
16384 1000 59.28 59.33 59.31 1104.68
32768 1000 86.39 86.47 86.42 1515.87
65536 640 142.47 142.60 142.53 1838.29
131072 320 402.11 402.98 402.57 1301.04
262144 160 648.90 650.30 649.68 1612.44
524288 80 1209.17 1213.71 1211.74 1727.89
1048576 40 2332.69 2355.17 2344.35 1780.89
2097152 20 4686.88 4767.48 4733.77 1759.55
4194304 10 9457.18 9674.69 9567.31 1734.13
# All processes entering MPI_Finalize
答案 0 :(得分:1)
一次IMB
基准测试
MPI_Sendrecv
和MPI_Exchange
)0
到4MB
)2
,4
和8
)由于使用mpirun
调用-np 8
一次,这意味着创建了8
个MPI任务。
因此,在测试大小2
通信器时,会在引擎盖下创建一个额外大小的6
通信器,其6
个MPI任务只挂在MPI_Barrier
中,因此消息< / p>
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)