我有一个这样的数据框:
Id row Date BuyPrice SellPrice Time
1 1 2017-10-30 94520 0 9:00:00
1 2 2017-10-30 94538 0 9:00:00
1 3 2017-10-30 94609 0 9:00:00
1 4 2017-10-30 94615 0 9:00:00
1 5 2017-10-30 94617 0 9:00:00
1 1 2017-09-20 99100 99159 9:00:10
1 2 2017-09-20 99102 99058 9:00:11
1 3 2017-09-20 99103 99057 9:00:12
1 4 2017-09-20 99104 99056 9:00:10
1 5 2017-09-20 99105 99055 9:00:10
1 1 2017-09-20 98100 99190 9:01:10
1 2 2017-09-20 98099 99091 9:01:10
1 3 2017-09-20 98098 99092 9:01:10
1 4 2017-09-20 98097 99093 9:01:10
1 5 2017-09-20 98096 99094 9:01:10
2 1 2010-11-01 99890 100000 10:00:02
2 2 2010-11-01 99899 100000 10:00:02
2 3 2010-11-01 99901 99899 9:00:02
2 4 2010-11-01 99920 99850 10:00:02
2 5 2010-11-01 99933 99848 10:00:23
我想计算SellPrice - BuyPrice用于行" row"等于1(每天和每个id分别)。如果BuyPrice或SellPrice为0,则应分配NaN。
目标输出应如下所示:
Id row Date BuyPrice SellPrice Spread
1 1 2017-10-30 94520 0 NaN
1 1 2017-09-20 99100 99159 59
1 1 2017-09-20 98100 99190 90
2 1 2010-11-01 99890 100000 110
以下是我迄今为止尝试过的代码:
df1 = df.groupby(['SID','Date'], sort=False)
df1['Spread'] =np.where((df['row']==1).eq(0).any(1),np.nan,df['SellPrice']-df['BuyPrice'])
但我收到此错误:
ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>
答案 0 :(得分:1)
如果您在此处显示行列。您不需要groupby,只需使用query
按@cmaher建议过滤行。
df.query('row == 1').assign(Spread =
df['SellPrice'].mask(df['SellPrice'].eq(0)) -
df['BuyPrice'])
输出:
Id row Date BuyPrice SellPrice Time Spread
0 1 1 2017-10-30 94520 0 9:00:00 NaN
5 1 1 2017-09-20 99100 99159 9:00:10 59.0
10 1 1 2017-09-20 98100 99190 9:01:10 1090.0
15 2 1 2010-11-01 99890 100000 10:00:02 110.0