我是使用python进行时间序列编程的新手。 考虑一个包含买入或卖出股票的订单及其相应状态的文件。 订单文件包含多行,每行包含订单的状态。
Following is sample content of the order file:
{"DATETIME":"20171116 03:46:16.142514", "DATA":
{"MODE":"ORD","INSTR":"INSTR1","TYPE":"New","id":1}}
{"DATETIME":"20171116 03:46:16.243121", "DATA":
{"MODE":"ORD","INSTR":"INSTR2","TYPE":"New","id":2}}
{"DATETIME":"20171116 03:46:16.758292", "DATA":
{"MODE":"ORD","INSTR":"INSTR3","TYPE":"New","id":3}}
{"DATETIME":"20171116 03:46:17.212341", "DATA":
{"MODE":"ORD","INSTR":"INSTR2","TYPE":"TRD","id":2}}
{"DATETIME":"20171116 03:46:17.467893", "DATA":
{"MODE":"ORD","INSTR":"INSTR1","TYPE":"CXL","id":1}}
{"DATETIME":"20171116 03:46:18.924825", "DATA":
{"MODE":"ORD","INSTR":"INSTR3","TYPE":"TRD","id":3}}
一行中每个字段的详细信息如下 ●DateTime ○订单的时间戳
○ Format
■ YYYYMMDD hh:mm:ss.mi
● MODE
○ Type of the message
○ Always will be ORD
● INSTR
○ Name of the instrument
● TYPE
○ Type of the order
○ Following are the possible values
■ NEW
● Opens a new order
● Order will be active as long as it is in NEW state
■ CXL
● Order got cancelled. Order will be in a closed state after CXL
■ TRD
● Order got traded. Order will be in a closed state after TRD
● ID
○ Unique Id for identifying a particular order
○ Use ID to find state of the same order
We define holding time as the time, in microseconds, an order is active. Order is active as long as it is in NEW state.
Given an order file calculate the following distribution of holding period per ticker.
● Mean
● Median
● Max
● 75th percentile
● 90the percentile
● 99the percentile
● Standard deviation
有人可以帮助我......非常感谢你们。
答案 0 :(得分:0)
使用pandas按功能转换,使新状态和当前状态的日期时间在同一行
import pandas as pd
data = \
[{"DATETIME":"20171116 03:46:16.142514",
"MODE":"ORD","INSTR":"INSTR1","TYPE":"New","id":1},
{"DATETIME":"20171116 03:46:16.243121"
,"MODE":"ORD","INSTR":"INSTR2","TYPE":"New","id":2},
{"DATETIME":"20171116 03:46:16.758292"
,"MODE":"ORD","INSTR":"INSTR3","TYPE":"New","id":3},
{"DATETIME":"20171116 03:46:17.212341"
,"MODE":"ORD","INSTR":"INSTR2","TYPE":"TRD","id":2},
{"DATETIME":"20171116 03:46:17.467893"
,"MODE":"ORD","INSTR":"INSTR1","TYPE":"CXL","id":1},
{"DATETIME":"20171116 03:46:18.924825"
,"MODE":"ORD","INSTR":"INSTR3","TYPE":"TRD","id":3}]
df = pd.DataFrame(data)
df.sort_values(by=['id','DATETIME'],inplace=True)
df['DATETIME'] = pd.to_datetime(df['DATETIME'])
# I am assuming that id 1's next state cannot be new again
df['DATETIME_shiftby_1'] = df['DATETIME'].shift(1)
df['hold_out_time'] = df['DATETIME'] - df['DATETIME_shiftby_1']
def fun(x):
if(x.shape[0]>1):
# returning the second term as shift by increses the index vale by 1.
# So second row will contain datetime of new state as DATETIME_shiftby_1 and current datetime as DATETIME
return x.iloc[1,6]
else: return 'still active'
#This dataframe will contain the holdout time for every id
df.groupby(['id']).agg(fun)