嗨,我想通过使用线程加速来计算从1到N的整数之和。因此,我写道:
import threading
N = int(input())
sum = 0
i = 1
lock = threading.Lock()
def thread_worker():
global sum
global i
lock.acquire()
sum += i
i += 1
lock.release()
for j in range(N):
w = threading.Thread(target = thread_worker)
w.start()
因为我不想弄乱我的变量i,所以我使用了锁功能。但是,由于每次只有一个线程可以访问共享变量,因此无法提高代码速度。 所以我想知道有什么我可以改善线程运行时间的吗?添加更多的变量和锁会有所帮助吗?
谢谢!
答案 0 :(得分:0)
欢迎来到StackOverflow,并祝贺您第一个问题!
我觉得您的问题有两个答案:
线程总是会有一些开销的:创建线程,锁定等。即使开销很小,也将保留一些开销。这意味着只有在每个线程都有一些重要工作要做时,线程才值得。您的函数在lock.acquire()/release()
内部几乎不执行任何操作。如果您在那里进行一些更复杂的工作,则线程将起作用。但是您正确使用了锁定功能-这对线程处理来说不是一个好问题。
我正在运行Python 3.7,并安装了pytest-benchmark
3.2.2。 Pytest基准测试会为您运行适当次数的功能,并在运行时报告统计信息。
import threading
import pytest
def compute_variant1(N: int) -> int:
global sum_
global i
sum_ = 0
i = 1
lock = threading.Lock()
def thread_worker():
global sum_
global i
lock.acquire()
sum_ += i
i += 1
lock.release()
threads = []
for j in range(N):
threads.append(threading.Thread(target = thread_worker))
threads[-1].start()
for t in threads:
t.join()
return sum_
@pytest.mark.parametrize("func", [
compute_variant1])
@pytest.mark.parametrize("N,expected", [(10, 55), (30, 465), (100, 5050)])
def test_var1(benchmark, func, N, expected):
result = benchmark(func, N=N )
assert result == expected
使用以下命令运行此命令:py.test --benchmark-histogram=bench
并打开生成的bench.svg
,它还会打印此表:
----------------------------------------------------------------------------------------------------- benchmark: 3 tests -----------------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_var1[10-55-compute_variant1] 570.0710 (1.0) 4,900.3160 (1.51) 658.8392 (1.0) 270.6056 (1.36) 602.5220 (1.0) 42.6373 (1.0) 19;82 1,517.8209 (1.0) 529 1
test_var1[30-465-compute_variant1] 1,701.2970 (2.98) 3,237.5830 (1.0) 1,879.8490 (2.85) 198.3905 (1.0) 1,802.5160 (2.99) 146.4465 (3.43) 59;43 531.9576 (0.35) 432 1
test_var1[100-5050-compute_variant1] 5,809.7500 (10.19) 13,354.3520 (4.12) 6,698.4778 (10.17) 1,413.1428 (7.12) 6,235.4355 (10.35) 766.0440 (17.97) 6;7 149.2876 (0.10) 74 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
将快速实现与最简单的实现进行比较通常真的很有帮助。
我能想到的最简单的实现-不使用Python技巧-是一个简单的循环:
def compute_variant2(N):
sum_ = 0
for i in range(1, N+1):
sum_ += i
return sum_
Python有一个名为sum()
的函数,该函数获取列表或任何{em> iterable ,例如range(N)
:
def compute_variant3(N: int) -> int:
return sum(range(1, N+1))
以下是结果(我已删除了一些列):
------------------------------------------------------- benchmark: 3 tests ------------------------------------------------
Name (time in us) Mean StdDev Median Rounds Iterations
---------------------------------------------------------------------------------------------------------------------------
test_var1[100-5050-compute_variant2] 4.2328 (1.0) 1.6411 (1.0) 4.1150 (1.0) 163106 1
test_var1[100-5050-compute_variant3] 4.7113 (1.11) 1.6773 (1.02) 4.5560 (1.11) 141744 1
test_var1[100-5050-compute_variant1] 6,404.1856 (>1000.0) 668.2502 (407.21) 6,257.6385 (>1000.0) 106 1
---------------------------------------------------------------------------------------------------------------------------
如您所见,基于线程的variant1
比顺序实现慢很多很多倍。
卡尔·弗里德里希·高斯(Carl Friedrich Gauss)还是个孩子1时,他重新发现了我们问题的近似形式的解决方案:N * (N + 1) / 2
。
这可以用Python表达:
def compute_variant4(N: int) -> int:
return N * (N + 1) // 2
让我们将其与其他快速实现进行比较(我将省略第一个实现):
并且您可以在表中看到:最后一个变体更快,并且重要的是与N无关。
---------------------------------------------------------------------------------------------------- benchmark: 12 tests -----------------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
var1[10-55-compute_variant4] 162.0100 (1.0) 1,053.6800 (1.0) 170.1799 (1.0) 32.5862 (1.0) 163.7200 (1.0) 8.1800 (1.59) 1135;1225 5,876.1353 (1.0) 58310 100
var1[13-91-compute_variant4] 162.1400 (1.00) 1,354.3200 (1.29) 181.5037 (1.07) 46.2132 (1.42) 176.1800 (1.08) 5.1500 (1.0) 2214;14374 5,509.5296 (0.94) 61452 100
var1[30-465-compute_variant4] 188.8900 (1.17) 1,874.2800 (1.78) 200.4983 (1.18) 40.8533 (1.25) 191.0000 (1.17) 9.8300 (1.91) 1245;1342 4,987.5732 (0.85) 51919 100
var1[100-5050-compute_variant4] 192.9091 (1.19) 5,938.7273 (5.64) 209.0628 (1.23) 75.8845 (2.33) 198.5455 (1.21) 10.9091 (2.12) 1879;4696 4,783.2508 (0.81) 194515 22
var1[10-55-compute_variant2] 676.1000 (4.17) 18,987.8000 (18.02) 719.4231 (4.23) 194.5556 (5.97) 689.0000 (4.21) 34.9000 (6.78) 1447;2199 1,390.0027 (0.24) 125898 10
var1[13-91-compute_variant2] 753.9000 (4.65) 12,103.8000 (11.49) 799.2837 (4.70) 201.3654 (6.18) 766.5000 (4.68) 38.1000 (7.40) 1554;3049 1,251.1203 (0.21) 124441 10
var1[10-55-compute_variant3] 1,021.0000 (6.30) 77,718.8000 (73.76) 1,157.6125 (6.80) 544.1982 (16.70) 1,098.2000 (6.71) 73.0000 (14.17) 3802;12244 863.8469 (0.15) 186672 5
var1[13-91-compute_variant3] 1,127.6000 (6.96) 44,606.4000 (42.33) 1,279.9332 (7.52) 476.7700 (14.63) 1,200.2000 (7.33) 90.0000 (17.48) 4022;21018 781.2908 (0.13) 172147 5
var1[30-465-compute_variant2] 1,304.7500 (8.05) 48,218.5000 (45.76) 1,457.3923 (8.56) 550.7975 (16.90) 1,385.7500 (8.46) 80.2500 (15.58) 3944;22221 686.1570 (0.12) 177086 4
var1[30-465-compute_variant3] 1,738.6667 (10.73) 86,587.3333 (82.18) 1,935.1659 (11.37) 762.7118 (23.41) 1,860.3333 (11.36) 128.6667 (24.98) 2474;6870 516.7516 (0.09) 176773 3
var1[100-5050-compute_variant2] 3,891.0000 (24.02) 181,009.0000 (171.79) 4,218.4608 (24.79) 1,721.8413 (52.84) 3,998.0000 (24.42) 210.0000 (40.78) 1788;2670 237.0533 (0.04) 171881 1
var1[100-5050-compute_variant3] 4,200.0000 (25.92) 76,885.0000 (72.97) 4,516.5853 (26.54) 1,452.5713 (44.58) 4,343.0000 (26.53) 210.0000 (40.78) 1204;2311 221.4062 (0.04) 153587 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------