我有一个熊猫数据框,其中有日期时间(不在索引中,因此更喜欢)。 我想将其上采样(重新采样)到指定的时间范围,例如“ 10S”。并将字符串数据(即Acitivty / Action / EPIC等列)也保留在dataFrame中。
Ind TIME_STAMP Activity Action Quantity EPIC Price Sub-activity Venue Position
0 2018-08-22 08:01:36 Allocation SELL 100.0 BB. 1.142200 CPTY 300AD -427.0
1 2018-08-22 08:02:17 Allocation BUY 15.0 BB. 1.152300 CPTY ZDDD02 -388.0
2 2018-08-22 08:24:51 Allocation SELL 60.0 BB. 1.165900 CPTY 666 -515.0
3 2018-08-22 09:07:59 NaN NaN NaN NaN 1.167921 NaN -515.0
4 2018-08-22 09:11:00 NaN NaN NaN NaN 1.174500 NaN
我尝试了几种不同的方法,即dataFrame.asfreq(freq ='10S');和dataFrame.resample('10S',on ='TIME_STAMP')
我真正想做的是 1)将数据升采样为10秒的块,并保留原始数据; 2)使用“ TIME_STAMP”列。和3)之后,可以使用.fillna(method ='pad')
之类的填充方法来填充数值数据答案 0 :(得分:2)
想法是由GroupBy.cumcount
创建帮助列,由unstack
创建唯一的Datetimeindex
,最后由stack
重新创建:
print (df)
TIME_STAMP Activity Action Quantity EPIC Price \
Ind
0 2018-08-22 08:01:36 Allocation SELL 100.0 BB. 1.142200
1 2018-08-22 08:01:36 Allocation BUY 15.0 BB. 1.152300
2 2018-08-22 08:01:51 Allocation SELL 60.0 BB. 1.165900
3 2018-08-22 08:02:59 NaN NaN NaN NaN 1.167921
4 2018-08-22 08:02:59 NaN NaN NaN NaN 1.174500
Sub-activity Venue Position
Ind
0 CPTY 300AD -427.0
1 CPTY ZDDD02 -388.0
2 CPTY 666 -515.0
3 NaN -515.0 NaN
4 NaN NaN NaN
df = (df.set_index(['TIME_STAMP', df.groupby('TIME_STAMP').cumcount()])
.unstack()
.asfreq('10S', method ='pad')
.stack()
.reset_index(level=1, drop=True)
.sort_index())
print (df)
Activity Action Quantity EPIC Price Sub-activity \
TIME_STAMP
2018-08-22 08:01:36 Allocation SELL 100.0 BB. 1.1422 CPTY
2018-08-22 08:01:36 Allocation BUY 15.0 BB. 1.1523 CPTY
2018-08-22 08:01:46 Allocation SELL 100.0 BB. 1.1422 CPTY
2018-08-22 08:01:46 Allocation BUY 15.0 BB. 1.1523 CPTY
2018-08-22 08:01:56 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:06 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:16 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:26 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:36 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:46 Allocation SELL 60.0 BB. 1.1659 CPTY
2018-08-22 08:02:56 Allocation SELL 60.0 BB. 1.1659 CPTY
Venue Position
TIME_STAMP
2018-08-22 08:01:36 300AD -427.0
2018-08-22 08:01:36 ZDDD02 -388.0
2018-08-22 08:01:46 300AD -427.0
2018-08-22 08:01:46 ZDDD02 -388.0
2018-08-22 08:01:56 666 -515.0
2018-08-22 08:02:06 666 -515.0
2018-08-22 08:02:16 666 -515.0
2018-08-22 08:02:26 666 -515.0
2018-08-22 08:02:36 666 -515.0
2018-08-22 08:02:46 666 -515.0
2018-08-22 08:02:56 666 -515.0
答案 1 :(得分:0)
这是强制执行计算的一种惊人的蛮力方式。我通过asfreq和resample生成1秒的时间序列;然后我做必要的计算;最后重新采样到实际的时间频率。
dataT = pd.merge_ordered(VWAP, dataT, on='TIME_STAMP') # this is to merge the market-data stream and order stream
dataT.Quantity = dataT.Quantity.astype('float')
dataT.Price = dataT.Price.astype('float')
dataT.TIME_STAMP = pd.to_datetime(dataT.TIME_STAMP)
dataT['Position'] = dataT.apply(lambda row : (row['Quantity'] if row['Action'] == 'BUY' else -row['Quantity']) if row['Activity'] == 'Allocation' else 0, axis =1).cumsum()
dataT['Price'] = dataT.apply(lambda row : (row['Price'] if pd.isnull(row['Price']) == False else row['VWAP']), axis =1)
dataY = dataT.set_index(['TIME_STAMP']).resample('1S').pad()
dataT = dataT.set_index(['TIME_STAMP']).asfreq(freq='1S', method=None)
dataT.Price = dataT.Price.fillna(method = 'pad')
dataT.Position = dataT.Position.fillna(method = 'ffill')
dataT.Price = dataY.Price
dataT.Quantity = dataT.Quantity.fillna(0)
dataT.Position = dataY.Position
dataT.EPIC = dataT.EPIC.fillna('BB.')
dataT['MtM_Trd'] = (dataT.Quantity * dataT.Price).fillna(0)
dataT['MtM_Pos'] = dataT.Position * dataT.Price
dataT['MtM_PnL'] = (dataT.Price.diff(periods = 1) * dataT.Position.shift(periods=1)).cumsum().fillna(0)
dataT = dataT.reset_index()
dataT['Ret_Cum'] = (dataT.MtM_PnL/dataT.MtM_Trd.abs().cumsum()) * 100
dataT['Ret_Ins'] = ((dataT.Price.diff(periods = 1) * dataT.Position.shift(periods=1))/dataT.MtM_Pos.abs()) * 100
dataT['Std_Cum'] = ((dataT.Ret_Cum**2).cumsum()/(dataT.index.values) - (dataT.Ret_Cum.cumsum()/dataT.index.values)**2).apply(np.sqrt)
dataT['Std_Ins'] = pd.rolling_std(dataT.Ret_Ins, window = 60).fillna(0)
dataT['Sharpe_Cum'] = (dataT.Ret_Cum/dataT.Std_Cum).fillna(0)
dataT['Sharpe_Ins'] = (pd.rolling_mean(dataT.Ret_Ins, window = 60)/pd.rolling_std(dataT.Ret_Ins, window = 60)).fillna(0)
dataT = dataT.resample('10S', on='TIME_STAMP').first()