我有这样的债券市场数据:
Id row Date BuyPrice SellPrice
1 1 2017-10-30 94520 0
1 2 2017-10-30 94538 0
1 3 2017-10-30 94609 0
1 4 2017-10-30 94615 0
1 5 2017-10-30 94617 0
1 1 2017-09-20 99100 99059
1 1 2017-09-20 98100 99090
2 1 2010-11-01 99890 100000
2 2 2010-11-01 99899 100000
2 3 2010-11-01 99901 99899
2 4 2010-11-01 99920 99850
2 5 2010-11-01 99933 99848
我想为每个ID选择最低卖价和最高买入价并计算它们的减法,但如果卖出或价格的最小值为零,我想做出例外并退出该日期。
并且还按日期为每个id指定一个索引。意味着每个给出的第一天和第二天给出2,依此类推。
最后数据应该是这样的:
Id Date highest buy price lowest sell price NBBO(highest buy price - lowestSellPrice)Index
1 2017-10-30 94520 0 NaN 1
1 2017-09-20 99100 99059 41 2
2 2017-11-01 99890 99848 42 1
答案 0 :(得分:0)
您可以使用groupby
并汇总min
和最高,然后numpy.where
汇总NaN
条件。上次使用cumcount
:
df = df.groupby(['Id','Date'], sort=False).agg({'BuyPrice':'max','SellPrice':'min'})
df['NBBO'] = np.where(df[['BuyPrice', 'SellPrice']].eq(0).any(1),
np.nan,
df['BuyPrice'] - df['SellPrice'])
df['index'] = df.groupby(level=0).cumcount() + 1
d = {'BuyPrice':'highest buy price','SellPrice':'lowest sell price'}
df = df.reset_index().rename(columns=d)
print (df)
Id Date highest buy price lowest sell price NBBO index
0 1 2017-10-30 94617 0 NaN 1
1 1 2017-09-20 99100 99059 41.0 2
2 2 2010-11-01 99933 99848 85.0 1
详情:
#comapre with 0 eq is same as ==
print (df[['BuyPrice', 'SellPrice']].eq(0))
BuyPrice SellPrice
Id Date
1 2017-10-30 False True
2017-09-20 False False
2 2010-11-01 False False
#get at least one True per row by any(1)
print (df[['BuyPrice', 'SellPrice']].eq(0).any(1))
Id Date
1 2017-10-30 True
2017-09-20 False
2 2010-11-01 False
dtype: bool