如果它是单行,我可以得到如下的迭代器
import pandas as pd
import numpy as np
a = np.zeros((100,40))
X = pd.DataFrame(a)
for index, row in X.iterrows():
print index
print row
现在我希望每个迭代器都返回一个子集X[0:9, :]
,X[5:14, :]
,X[10:19, :]
等。如何通过滚动(pandas.DataFrame.rolling
)来实现这一点?
答案 0 :(得分:5)
我将尝试使用以下数据框。
import pandas as pd
import numpy as np
from string import uppercase
def generic_portfolio_df(start, end, freq, num_port, num_sec, seed=314):
np.random.seed(seed)
portfolios = pd.Index(['Portfolio {}'.format(i) for i in uppercase[:num_port]],
name='Portfolio')
securities = ['s{:02d}'.format(i) for i in range(num_sec)]
dates = pd.date_range(start, end, freq=freq)
return pd.DataFrame(np.random.rand(len(dates) * num_sec, num_port),
index=pd.MultiIndex.from_product([dates, securities],
names=['Date', 'Id']),
columns=portfolios
).groupby(level=0).apply(lambda x: x / x.sum())
df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)
df.head(10)
我现在将介绍一个函数来滚动多个行并连接到一个数据框中,我会在列索引中添加一个顶级,指示滚动中的位置。
def rolled(df, n):
k = range(df.columns.nlevels)
_k = [i - len(k) for i in k]
myroll = pd.concat([df.shift(i).stack(level=k) for i in range(n)],
axis=1, keys=range(n)).unstack(level=_k)
return [(i, row.unstack(0)) for i, row in myroll.iterrows()]
虽然它隐藏在函数中,myroll
看起来像这样
现在我们可以像迭代器一样使用它。
for i, roll in rolled(df.head(5), 3):
print roll
print
0 1 2
Portfolio
Portfolio A 0.326164 NaN NaN
Portfolio B 0.201597 NaN NaN
Portfolio C 0.085340 NaN NaN
0 1 2
Portfolio
Portfolio A 0.278614 0.326164 NaN
Portfolio B 0.314448 0.201597 NaN
Portfolio C 0.266392 0.085340 NaN
0 1 2
Portfolio
Portfolio A 0.258958 0.278614 0.326164
Portfolio B 0.089224 0.314448 0.201597
Portfolio C 0.293570 0.266392 0.085340
0 1 2
Portfolio
Portfolio A 0.092760 0.258958 0.278614
Portfolio B 0.262511 0.089224 0.314448
Portfolio C 0.084208 0.293570 0.266392
0 1 2
Portfolio
Portfolio A 0.043503 0.092760 0.258958
Portfolio B 0.132221 0.262511 0.089224
Portfolio C 0.270490 0.084208 0.293570
答案 1 :(得分:1)
这不是滚动的方式。它“提供滚动转换”(来自the docs)。
您可以循环使用pandas indexing?
for i in range((X.shape[0] + 9) // 10):
X_subset = X.iloc[i * 10: (i + 1) * 10])