我想每天/每周/每月重新采样DataFrame。我很困惑,并且不知道它应该如何。如何使用条件重新采样并对新创建的行求和。
df = pd.DataFrame({
'date': ['2014-08-4 19:00:00', '2014-08-5 10:09:00', '2014-08-4 21:04:00','2014-08-5 22:07:00', '2014-08-5 22:09:00', '2014-08-5 22:09:00', '2014-08-4 22:09:00', '2014-08-5 22:09:00', '2014-08-4 22:09:00', '2014-08-5 22:09:00', '2014-08-4 22:09:00', '2014-08-4 22:09:00', '2014-08-5 22:09:00', '2014-08-4 22:09:00', '2014-08-4 22:09:00', '2014-08-5 22:09:00', '2014-08-4 22:09:00', '2014-08-5 22:09:00', '2014-08-4 22:09:00', '2014-08-5 22:09:00'],
'id' :[4,5,7,8,2,3,5,2,1,1,4,4,2,4,5,1,3,9,7,9],
'qty' :[9,5,7,8,3,3,5,2,1,1,4,4,2,4,5,1,3,5,7,9],
'type' :[1,0,1,0,1,1,0,0,1,1,0,0,0,1,1,1,0,0,1,0]
})
inward = df['type'] == 0
outward = df['type'] == 1
df1 = df.join(df[inward].groupby(['id'])['qty'].sum(), on='id', rsuffix='_inward')
df2 = df.join(df[outward].groupby(['id'])['qty'].sum(), on='id', rsuffix='_outward')
df1['qty_outward'] = df2['qty_outward']
我试图以下列格式获取数据
date id qty_inward qty_outward
0 2014-08-04 19:00:00 4 8 13
1 2014-08-05 10:09:00 5 5 0
2 2014-08-04 21:04:00 7 0 14
3 2014-08-05 22:07:00 8 8 0
4 2014-08-05 22:09:00 2 4 3
5 2014-08-05 22:09:00 3 0 3
8 2014-08-04 22:09:00 1 0 1
9 2014-08-05 22:09:00 1 0 2
14 2014-08-04 22:09:00 5 5 5
16 2014-08-04 22:09:00 3 3 0
17 2014-08-05 22:09:00 9 14 0
这些我每周/每日/每月创建开仓和平仓股票。如果您有任何建议,我的方法可能是错误的。
答案 0 :(得分:0)
我认为您可以resample
使用groupby
- 这是0.18.1中的新功能。最后通过unstack
第二级重塑到列:
print (df.groupby(['id', 'type'])
.resample('D')['qty']
.sum()
.unstack(1, fill_value=0)
.reset_index(level=0))
type id qt_inward qt_outward
date
2014-08-04 1 0 1
2014-08-05 1 0 2
2014-08-05 2 4 3
2014-08-04 3 3 0
2014-08-05 3 0 3
2014-08-04 4 8 13
2014-08-04 5 5 5
2014-08-05 5 5 0
2014-08-04 7 0 14
2014-08-05 8 8 0
2014-08-05 9 14 0