我有这样的债券市场数据:
Id row Date BuyPrice SellPrice Time
1 1 2017-10-30 94520 0 9:00:00
1 2 2017-10-30 94538 0 9:00:00
1 3 2017-10-30 94609 0 9:00:00
1 4 2017-10-30 94615 0 9:00:00
1 5 2017-10-30 94617 0 9:00:00
1 1 2017-09-20 99100 99159 9:00:10
1 2 2017-09-20 99102 99058 9:00:11
1 3 2017-09-20 99103 99057 9:00:12
1 4 2017-09-20 99104 99056 9:00:10
1 5 2017-09-20 99105 99055 9:00:10
1 1 2017-09-20 98100 99190 9:01:10
1 2 2017-09-20 98099 99091 9:01:10
1 3 2017-09-20 98098 99092 9:01:10
1 4 2017-09-20 98097 99093 9:01:10
1 5 2017-09-20 98096 99094 9:01:10
2 1 2010-11-01 99890 100000 10:00:02
2 2 2010-11-01 99899 100000 10:00:02
2 3 2010-11-01 99901 99899 9:00:02
2 4 2010-11-01 99920 99850 10:00:02
2 5 2010-11-01 99933 99848 10:00:23
第1步:
我想为每天的每个id计算第一行的点差(= SellPrice - BuyPrice),如果BuyPrice或SellPrice中存在零,则排除零(此类数据报告为nan),数据在此步骤应该是这样的:
id row Date BuyPrice SellPrice Spread
1 1 2017-10-30 94520 0 NaN
1 1 2017-09-20 99100 99159 59
1 1 2017-09-20 98100 99190 190
2 1 2010-11-01 99890 100000 110
第2步:
现在我想计算每个id每天的Spread平均值,并给出日期的索引值
最后数据应该是这样的:
Id Date avg.spread(average of spread for each day) index
1 2017-10-30 NaN 1
1 2017-09-20 124.5(=(59+190)/2) 2
2 2010-11-01 110 1
答案 0 :(得分:2)
我尽力了解你想要的东西,虽然你没有明确提到它,但我想你想groupby
Id
,row
, date
和 g = df.assign(diff=df.SellPrice.sub(df.BuyPrice))\
.groupby(['Id', 'row', 'Date']).diff.mean()
v = g.groupby(level=[0, 1]).cumcount().add(1).values
df = g.reset_index().assign(index=v)
df
Id row Date diff index
0 1 1 2017-09-20 574.5 1
1 1 1 2017-10-30 NaN 2
2 1 2 2017-09-20 474.0 1
3 1 2 2017-10-30 NaN 2
4 1 3 2017-09-20 474.0 1
5 1 3 2017-10-30 NaN 2
6 1 4 2017-09-20 474.0 1
7 1 4 2017-10-30 NaN 2
8 1 5 2017-09-20 474.0 1
9 1 5 2017-10-30 NaN 2
10 2 1 2010-11-01 110.0 1
11 2 2 2010-11-01 101.0 1
12 2 3 2010-11-01 -2.0 1
13 2 4 2010-11-01 -70.0 1
14 2 5 2010-11-01 -85.0 1
。
{{1}}