我有一个由YearMo索引的5列数据框:
yearmo = np.repeat(np.arange(2000, 2010) * 100, 12) + [x for x in range(1,13)] * 10
rates = pd.DataFrame(data=np.random.random(120, 5)),
index=pd.Series(data=yearmo, name='YearMo'),
columns=['A', 'B','C', 'D', 'E'])
rates.head()
YearMo A B C D E
200411 0.237696 0.341937 0.258713 0.569689 0.470776
200412 0.601713 0.313006 0.221821 0.720162 0.889891
200501 0.024379 0.761315 0.225032 0.293682 0.302431
200502 0.996778 0.388783 0.026448 0.056188 0.744850
200503 0.942024 0.768416 0.484236 0.102904 0.287446
我想要做的是能够应用滚动窗口并将所有五列传递给函数 - 类似于:
rates.rolling(window=60, min_periods=60).apply(lambda x: my_func(data=x, param=5)
但是这种方法将函数应用于每列。指定axis=1
也无效......
答案 0 :(得分:1)
问题:...应用滚动窗口并将所有五列传递给函数
这将做你想要的,min_periods=5, axis=1
。
.rolling(...
窗口是“A”列:“E”或 5 的倍数。
def f1(data=None):
print('f1(%s, %s) data=%s' % (str(type(data)), param, data))
return data.sum()
subRates = rates.rolling(window=60, min_periods=5, axis=1).apply(lambda x: f1( x ) )
<强>输入强>:
A B C D E
YearMo
200001 0.666744 0.569194 0.546873 0.018696 0.240783
200002 0.035888 0.853077 0.348200 0.921997 0.283177
200003 0.652761 0.076630 0.298076 0.800504 0.041231
200004 0.537397 0.968399 0.211072 0.328157 0.929783
200005 0.759506 0.702220 0.807477 0.886935 0.022587
<强>输出强>:
f1(<class 'numpy.ndarray'>, None) data=[ 0.66674393 0.56919434 0.54687296 0.01869609 0.24078329]
f1(<class 'numpy.ndarray'>, None) data=[ 0.03588751 0.85307707 0.34819965 0.92199698 0.28317727]
f1(<class 'numpy.ndarray'>, None) data=[ 0.65276067 0.07663029 0.29807589 0.80050448 0.04123137]
f1(<class 'numpy.ndarray'>, None) data=[ 0.53739687 0.96839917 0.21107155 0.32815687 0.92978308]
f1(<class 'numpy.ndarray'>, None) data=[ 0.75950632 0.70222034 0.80747698 0.88693524 0.02258685]
A B C D E
YearMo
200001 NaN NaN NaN NaN 2.042291
200002 NaN NaN NaN NaN 2.442338
200003 NaN NaN NaN NaN 1.869203
200004 NaN NaN NaN NaN 2.974808
200005 NaN NaN NaN NaN 3.178726
使用Python测试:3.4.2 - pandas:0.19.2