我正在尝试对代码进行矢量化,并且在很大程度上要归功于一些用户(https://stackoverflow.com/users/3293881/divakar,https://stackoverflow.com/users/625914/behzad-nouri),我能够取得巨大进步。基本上,我正在尝试将通用函数(在本例中为max_dd_array_ret
)应用于我找到的每个区域(有关日期向量化的详细信息,请参阅vectorize complex slicing with pandas dataframe;有关Start, End and Duration of Maximum Drawdown in Python的基本原理http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html {1}})。问题如下:我应该能够获得结果max_dd_array_ret
,并且在某种程度上,df_2
是我正在寻找的,除了它的悲剧效果,就好像前两个箱子一样合并后,最后一个缺失,因为可以通过查看结果来衡量。
非常欢迎任何解释和解决方法
ranged_DD(asd_1.values, starts, ends+1)
结果:
import pandas as pd
import numpy as np
from time import time
from scipy.stats import binned_statistic
def max_dd_array_ret(xs):
xs = (xs+1).cumprod()
i = np.argmax(np.maximum.accumulate(xs) - xs) # end of the period
j = np.argmax(xs[:i])
max_dd = abs(xs[j]/xs[i] -1)
return max_dd if max_dd is not None else 0
def get_ranges_arr(starts,ends):
# Taken from https://stackoverflow.com/a/37626057/3293881
counts = ends - starts
counts_csum = counts.cumsum()
id_arr = np.ones(counts_csum[-1],dtype=int)
id_arr[0] = starts[0]
id_arr[counts_csum[:-1]] = starts[1:] - ends[:-1] + 1
return id_arr.cumsum()
def ranged_DD(arr,starts,ends):
# Get all indices and the IDs corresponding to same groups
idx = get_ranges_arr(starts,ends)
id_arr = np.repeat(np.arange(starts.size),ends-starts)
slice_arr = arr[idx]
return binned_statistic(id_arr, slice_arr, statistic=max_dd_array_ret)[0]
asd_1 = pd.Series(0.01 * np.random.randn(500), index=pd.date_range('2011-1-1', periods=500)).pct_change()
index_1 = pd.to_datetime(['2011-2-2', '2011-4-3', '2011-5-1','2011-7-2', '2011-8-3', '2011-9-1','2011-10-2', '2011-11-3', '2011-12-1','2012-1-2', '2012-2-3', '2012-3-1',])
index_2 = pd.to_datetime(['2011-2-15', '2011-4-16', '2011-5-17','2011-7-17', '2011-8-17', '2011-9-17','2011-10-17', '2011-11-17', '2011-12-17','2012-1-17', '2012-2-17', '2012-3-17',])
starts = asd_1.index.searchsorted(index_1)
ends = asd_1.index.searchsorted(index_2)
df_2 = pd.DataFrame([max_dd_array_ret(asd_1.loc[i:j]) for i, j in zip(index_1, index_2)], index=index_1)
print(df_2[0].values)
print(ranged_DD(asd_1.values, starts, ends+1))
除了前两个之外是相同的:
df_2
[ 1.75893509 6.08002911 2.60131797 1.55631781 1.8770067 2.50709085
1.43863472 1.85322338 1.84767224 1.32605754 1.48688414 5.44786663]
ranged_DD(asd_1.values, starts, ends+1)
[ 6.08002911 2.60131797 1.55631781 1.8770067 2.50709085 1.43863472
1.85322338 1.84767224 1.32605754 1.48688414]
vs [ 1.75893509 6.08002911
和最后两个
[ 6.08002911
vs 1.48688414 5.44786663]
。:在详细查看文档({{3}})时,我发现这可能是问题
“除了最后一个(最右边)的垃圾箱以外都是半开的。换句话说, 如果箱子是[1,2,3,4],那么第一个箱子是[1,2](包括1, 但不包括2)和第二[2,3]。然而,最后一个箱子是[3, 4],其中包括4.版本0.11.0中的新功能。“
问题是我不知道如何重置它。