是否有可能提高np.irr函数的性能,使其可以在不使用for循环的情况下应用于二维现金流数组 - 或者通过向量化np.irr函数或通过替代方法算法?
numpy库中的irr函数计算周期性复合收益率,该收益率为现金流数组提供净现值0。此功能只能应用于一维数组:
x = np.array([-100,50,50,50])
r = np.irr(x)
np.irr不会对二维现金流数组起作用,例如:
cfs = np.zeros((10000,4))
cfs[:,0] = -100
cfs[:,1:] = 50
其中每行代表一系列现金流,而列代表时间段。因此,缓慢的实现是遍历每一行并将np.irr应用于各个行:
out = []
for x in cfs:
out.append(np.irr(x))
对于大型阵列,这是一个优化障碍。看一下np.irr函数的源代码,我认为主要的障碍是矢量化np.roots函数:
def irr(values):
res = np.roots(values[::-1])
mask = (res.imag == 0) & (res.real > 0)
if res.size == 0:
return np.nan
res = res[mask].real
# NPV(rate) = 0 can have more than one solution so we return
# only the solution closest to zero.
rate = 1.0/res - 1
rate = rate.item(np.argmin(np.abs(rate)))
return rate
我在R:Fast loan rate calculation for a big number of loans中找到了类似的实现,但不知道如何将其移植到Python中。另外,我不认为np.apply_along_axis或np.vectorize是这个问题的解决方案,因为我主要关心的是性能,我知道它们都是for循环的包装器。
谢谢!
答案 0 :(得分:3)
查看np.roots
的来源,
import inspect
print(inspect.getsource(np.roots))
我们看到它的工作原理是找到"伴随矩阵"的特征值。它还对零系数进行了一些特殊处理。我真的不了解数学背景,但我知道np.linalg.eigvals
可以用矢量化方式计算多个矩阵的特征值。
将其与np.irr
的来源合并后,产生了以下内容" Frankencode":
def irr_vec(cfs):
# Create companion matrix for every row in `cfs`
M, N = cfs.shape
A = np.zeros((M, (N-1)**2))
A[:,N-1::N] = 1
A = A.reshape((M,N-1,N-1))
A[:,0,:] = cfs[:,-2::-1] / -cfs[:,-1:] # slice [-1:] to keep dims
# Calculate roots; `eigvals` is a gufunc
res = np.linalg.eigvals(A)
# Find the solution that makes the most sense...
mask = (res.imag == 0) & (res.real > 0)
res = np.ma.array(res.real, mask=~mask, fill_value=np.nan)
rate = 1.0/res - 1
idx = np.argmin(np.abs(rate), axis=1)
irr = rate[np.arange(M), idx].filled()
return irr
这不处理零系数,当any(cfs[:,-1] == 0)
时肯定会失败。一些输入参数检查也不会受到伤害。还有其他一些问题吗?但是对于提供的示例数据,它实现了我们想要的(以增加内存使用为代价):
In [487]: cfs = np.zeros((10000,4))
...: cfs[:,0] = -100
...: cfs[:,1:] = 50
In [488]: %timeit [np.irr(x) for x in cfs]
1 loops, best of 3: 2.96 s per loop
In [489]: %timeit irr_vec(cfs)
10 loops, best of 3: 77.8 ms per loop
如果您有特殊情况的贷款具有固定的投资回报金额(如您所链接的问题),您可以使用插值更快地完成...
答案 1 :(得分:3)
在我发布这个问题后,我研究了这个问题,并提出了一个使用不同算法的矢量化解决方案:
def virr(cfs, precision = 0.005, rmin = 0, rmax1 = 0.3, rmax2 = 0.5):
'''
Vectorized IRR calculator. First calculate a 3D array of the discounted
cash flows along cash flow series, time period, and discount rate. Sum over time to
collapse to a 2D array which gives the NPV along a range of discount rates
for each cash flow series. Next, find crossover where NPV is zero--corresponds
to the lowest real IRR value. For performance, negative IRRs are not calculated
-- returns "-1", and values are only calculated to an acceptable precision.
IN:
cfs - numpy 2d array - rows are cash flow series, cols are time periods
precision - level of accuracy for the inner IRR band eg 0.005%
rmin - lower bound of the inner IRR band eg 0%
rmax1 - upper bound of the inner IRR band eg 30%
rmax2 - upper bound of the outer IRR band. eg 50% Values in the outer
band are calculated to 1% precision, IRRs outside the upper band
return the rmax2 value
OUT:
r - numpy column array of IRRs for cash flow series
'''
if cfs.ndim == 1:
cfs = cfs.reshape(1,len(cfs))
# Range of time periods
years = np.arange(0,cfs.shape[1])
# Range of the discount rates
rates_length1 = int((rmax1 - rmin)/precision) + 1
rates_length2 = int((rmax2 - rmax1)/0.01)
rates = np.zeros((rates_length1 + rates_length2,))
rates[:rates_length1] = np.linspace(0,0.3,rates_length1)
rates[rates_length1:] = np.linspace(0.31,0.5,rates_length2)
# Discount rate multiplier rows are years, cols are rates
drm = (1+rates)**-years[:,np.newaxis]
# Calculate discounted cfs
discounted_cfs = cfs[:,:,np.newaxis] * drm
# Calculate NPV array by summing over discounted cashflows
npv = discounted_cfs.sum(axis = 1)
## Find where the NPV changes sign, implies an IRR solution
signs = npv < 0
# Find the pairwise differences in boolean values when sign crosses over, the
# pairwise diff will be True
crossovers = np.diff(signs,1,1)
# Extract the irr from the first crossover for each row
irr = np.min(np.ma.masked_equal(rates[1:]* crossovers,0),1)
# Error handling, negative irrs are returned as "-1", IRRs greater than rmax2 are
# returned as rmax2
negative_irrs = cfs.sum(1) < 0
r = np.where(negative_irrs,-1,irr)
r = np.where(irr.mask * (negative_irrs == False), 0.5, r)
return r
性能:
import numpy as np
cfs = np.zeros((10000,4))
cfs[:,0] = -100
cfs[:,1:] = 50
%timeit [np.irr(x) for x in cfs]
10 loops, best of 3: 1.06 s per loop
%timeit virr(cfs)
10 loops, best of 3: 29.5 ms per loop