带有不确定性包的计算时间异常长

时间:2018-11-02 20:16:49

标签: python performance uncertainty

考虑以下代码片段:

import random
from uncertainties import unumpy, ufloat

x = [random.uniform(0,1) for p in range(1,8200)]
y = [random.randrange(0,1000) for p in range(1,8200)]
xerr = [random.uniform(0,1)/1000 for p in range(1,8200)]
yerr = [random.uniform(0,1)*10 for p in range(1,8200)]

x = unumpy.uarray(x, xerr)
y = unumpy.uarray(y, yerr)
diff = sum(x*y)
u = ufloat(0.0, 0.0)
for k in range(len(x)):
    u+= (diff-x[k])**2 * y[k]  

print(u)

如果我尝试在计算机上运行它,最多可能需要10分钟才能产生结果。我不太确定为什么会这样,并希望您提供某种解释。 如果我不得不猜测,我会说不确定性的计算出于某种原因比人们想像的要复杂,但是就像我说的那样,这只是一种猜测。有趣的是,如果最后删除print指令,几乎可以立即完成代码,老实说,这使我感到困惑,多于它的帮助...

如果您不知道,this是不确定性库的存储库。

1 个答案:

答案 0 :(得分:1)

我可以复制这个,印刷是永远的东西。或更确切地说,这是 转换为print隐式调用的字符串。 我使用line_profiler来度量__format__的{​​{1}}函数的时间。 (由AffineScalarFunc调用,由print调用) 我将阵列大小从8200减小到1000,以使其速度更快。结果就是(出于可读性考虑而修剪):

__str__

您可以看到几乎所有时间都是在1967行中计算出标准偏差的。如果深入研究,您会发现Timer unit: 1e-06 s Total time: 29.1365 s File: /home/veith/Projects/stackoverflow/test/lib/python3.6/site-packages/uncertainties/core.py Function: __format__ at line 1813 Line # Hits Time Per Hit % Time Line Contents ============================================================== 1813 @profile 1814 def __format__(self, format_spec): 1960 1961 # Since the '%' (percentage) format specification can change 1962 # the value to be displayed, this value must first be 1963 # calculated. Calculating the standard deviation is also an 1964 # optimization: the standard deviation is generally 1965 # calculated: it is calculated only once, here: 1966 1 2.0 2.0 0.0 nom_val = self.nominal_value 1967 1 29133097.0 29133097.0 100.0 std_dev = self.std_dev 1968 属性是问题,其中error_components属性是问题,其中derivatives是问题。如果您对此进行了概述,那么您将开始探究问题的根源。这里的大多数工作是平均分配的:

_linear_part.expand()

您会看到有{strong>很多呼叫Function: expand at line 1481 Line # Hits Time Per Hit % Time Line Contents ============================================================== 1481 @profile 1482 def expand(self): 1483 """ 1484 Expand the linear combination. 1485 1486 The expansion is a collections.defaultdict(float). 1487 1488 This should only be called if the linear combination is not 1489 yet expanded. 1490 """ 1491 1492 # The derivatives are built progressively by expanding each 1493 # term of the linear combination until there is no linear 1494 # combination to be expanded. 1495 1496 # Final derivatives, constructed progressively: 1497 1 2.0 2.0 0.0 derivatives = collections.defaultdict(float) 1498 1499 15995999 4942237.0 0.3 9.7 while self.linear_combo: # The list of terms is emptied progressively 1500 1501 # One of the terms is expanded or, if no expansion is 1502 # needed, simply added to the existing derivatives. 1503 # 1504 # Optimization note: since Python's operations are 1505 # left-associative, a long sum of Variables can be built 1506 # such that the last term is essentially a Variable (and 1507 # not a NestedLinearCombination): popping from the 1508 # remaining terms allows this term to be quickly put in 1509 # the final result, which limits the number of terms 1510 # remaining (and whose size can temporarily grow): 1511 15995998 6235033.0 0.4 12.2 (main_factor, main_expr) = self.linear_combo.pop() 1512 1513 # print "MAINS", main_factor, main_expr 1514 1515 15995998 10572206.0 0.7 20.8 if main_expr.expanded(): 1516 15992002 6822093.0 0.4 13.4 for (var, factor) in main_expr.linear_combo.items(): 1517 7996001 8070250.0 1.0 15.8 derivatives[var] += main_factor*factor 1518 1519 else: # Non-expanded form 1520 23995993 8084949.0 0.3 15.9 for (factor, expr) in main_expr.linear_combo: 1521 # The main_factor is applied to expr: 1522 15995996 6208091.0 0.4 12.2 self.linear_combo.append((main_factor*factor, expr)) 1523 1524 # print "DERIV", derivatives 1525 1526 1 2.0 2.0 0.0 self.linear_combo = derivatives ,其中呼叫expandedwhich is slow。 还要注意注释,这些注释暗示该库实际上仅在需要时才计算导数(并且知道否则确实很慢)。这就是为什么转换到字符串需要如此长的时间,而之前却没有花费时间的原因。

isinstance的{​​{1}}中:

__init__

AffineScalarFunc的{​​{1}}中:

# In order to have a linear execution time for long sums, the
# _linear_part is generally left as is (otherwise, each
# successive term would expand to a linearly growing sum of
# terms: efficiently handling such terms [so, without copies]
# is not obvious, when the algorithm should work for all
# functions beyond sums).

std_dev的{​​{1}}中:

AffineScalarFunc

总而言之,这在某种程度上是可以预料的,因为该库处理这些非本地数字,这些数字需要大量操作才能处理(显然)。