我希望通过numpy向量化来加速功能。我通过简单的方程式就取得了相当的成功,但是在进行更复杂的转换时,我很快就会失败。
以下是在已知干球温度和相对湿度的情况下计算空气的湿球温度的示例。 (计算从this repo改编而来)我试图简单地使用np.vectorize,但这只会使简单的Apply函数的速度提高大约2倍。我的其他Numpy优化速度已超过300倍。我不确定,在没有cython的情况下这不可能实现,因为我仍在学习numpy和矢量化的基础知识。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Temp_C':[20,0,6,-22,13,37,20,0,-10,8,14,24,19,12,4],
'relativeHumidty':[0.6,0.2,0.55,0.25,0.1,0.9,1,.67,0.24,0.81,0.46,0.51,0.50,0.65,0.72]})
def sat_press_si(tdb):
C1 = -5674.5359
C2 = 6.3925247
C3 = -0.009677843
C4 = 0.00000062215701
C5 = 2.0747825E-09
C6 = -9.484024E-13
C7 = 4.1635019
C8 = -5800.2206
C9 = 1.3914993
C10 = -0.048640239
C11 = 0.000041764768
C12 = -0.000000014452093
C13 = 6.5459673
TK = tdb + 273.15
if TK <= 273.15:
result = math.exp(C1/TK + C2 + C3*TK + C4*TK**2 + C5*TK**3 +
C6*TK**4 + C7*math.log(TK)) / 1000
else:
result = math.exp(C8/TK + C9 + C10*TK + C11*TK**2 + C12*TK**3 +
C13*math.log(TK)) / 1000
return result
def hum_rat_si(tdb, twb, P=14.257):
Pws = sat_press_si(twb)
Ws = 0.62198 * Pws / (P - Pws) # Equation 23, p6.8
if tdb >= 0:
result = (((2501 - 2.326 * twb) * Ws - 1.006 * (tdb - twb)) /
(2501 + 1.86 * tdb - 4.186 * twb))
else: # Equation 37, p6.9
result = (((2830 - 0.24*twb)*Ws - 1.006*(tdb - twb)) /
(2830 + 1.86*tdb - 2.1*twb))
return result
def hum_rat2_si(tdb, rh, P=14.257):
Pws = sat_press_si(tdb)
result = 0.62198*rh*Pws/(P - rh*Pws) # Equation 22, 24, p6.8
return result
def wet_bulb_si(tdb, rh, P=14.257):
W_normal = hum_rat2_si(tdb, rh, P)
result = tdb
W_new = hum_rat_si(tdb, result, P)
x = 0
while abs((W_new - W_normal) / W_normal) > 0.00001:
W_new2 = hum_rat_si(tdb, result - 0.001, P)
dw_dtwb = (W_new - W_new2) / 0.001
result = result - (W_new - W_normal) / dw_dtwb
W_new = hum_rat_si(tdb, result, P)
x += 1
if x > 500:
break
return result
wet_bulb_vectorized = np.vectorize(wet_bulb_si)
%timeit -n 300 wet_bulb_vectorized(df['Temp_C'].values, df['relativeHumidty'].values)
%timeit -n 300 df.apply(lambda row: wet_bulb_si(row['Temp_C'], row['relativeHumidty']), axis=1)
对于最后两个%timeit运行,我得到:
每个循环2.7 ms±16.8 µs(平均±标准偏差,共运行7次,每个循环300个) 每个循环4.17 ms±23.3 µs(平均±标准偏差,共运行7次,每个循环300个循环)
这里的任何建议将不胜感激!
答案 0 :(得分:0)
让我们首先关注[
{
"id": 12,
"name": "John"
},
{
"id": 54,
"name": "Ammie"
},
{
"id": 23,
"name": "Martin"
},
{
"id": 342,
"name": "Anna"
},
{
"id": 64,
"name": "Tom"
},
{
"id": 364,
"name": null
},
{
"id": null,
"name": 'Piter'
}
]
与math.log
的使用:
np.log
简单的列表迭代:
In [132]: x = np.linspace(1,10,5)
将其制成数组:
In [133]: [math.log(i) for i in x]
Out[133]:
[0.0,
1.1786549963416462,
1.7047480922384253,
2.0476928433652555,
2.302585092994046]
现在有In [134]: np.array([math.log(i) for i in x])
Out[134]: array([0. , 1.178655 , 1.70474809, 2.04769284, 2.30258509])
:
np.vectorize
还有In [135]: f = np.vectorize(np.log, otypes=[float])
In [136]: f(x)
Out[136]: array([0. , 1.178655 , 1.70474809, 2.04769284, 2.30258509])
:
np.log
In [137]: np.log(x)
Out[137]: array([0. , 1.178655 , 1.70474809, 2.04769284, 2.30258509])
的时间很小,但对于更大的数组,x
显然会获胜:
np.log
现在尝试一个可以处理负数和0值的版本:
In [138]: xb = np.linspace(1,10,5000)
In [139]: timeit np.array([math.log(i) for i in xb])
1.28 ms ± 3.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [140]: timeit f(xb)
6.84 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [141]: timeit np.log(xb)
174 µs ± 674 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
这可以在循环中使用,也可以像上面的def foo(x):
if x==0:
return -np.inf
elif x<0:
return math.log(-x)
else:
return math.log(x)
中使用。
我们可以使用简单的vectorized
代替基于值的输入:
np.log
使用def foo1(x):
mask1 = x<0
mask2 = x>0
res = np.full_like(x, -np.inf)
res[mask1] = np.log(-x[mask1])
res[mask2] = np.log(x[mask2])
return res
的额外参数的版本稍高。如果您不了解,请不要担心。它并没有那么快,并且可能对您的情况没有用。
np.log
(目前省略了平等测试和计时)。