如何根据条件选择行并分配值?

时间:2017-10-27 22:29:00

标签: python pandas data-analysis

我有一个这样的数据框:

 Id   row      Date       BuyPrice    SellPrice    Time
 1    1      2017-10-30    94520       0          9:00:00
 1    2      2017-10-30    94538       0          9:00:00
 1    3      2017-10-30    94609       0          9:00:00
 1    4      2017-10-30    94615       0          9:00:00
 1    5      2017-10-30    94617       0          9:00:00
 1    1      2017-09-20    99100       99159      9:00:10
 1    2      2017-09-20    99102       99058      9:00:11
 1    3      2017-09-20    99103       99057      9:00:12
 1    4      2017-09-20    99104       99056      9:00:10
 1    5      2017-09-20    99105       99055      9:00:10
 1    1      2017-09-20    98100       99190      9:01:10
 1    2      2017-09-20    98099       99091      9:01:10
 1    3      2017-09-20    98098       99092      9:01:10
 1    4      2017-09-20    98097       99093      9:01:10
 1    5      2017-09-20    98096       99094      9:01:10
 2    1      2010-11-01    99890       100000     10:00:02
 2    2      2010-11-01    99899       100000     10:00:02
 2    3      2010-11-01    99901       99899      9:00:02
 2    4      2010-11-01    99920       99850      10:00:02
 2    5      2010-11-01    99933       99848      10:00:23

我想计算SellPrice - BuyPrice用于行" row"等于1(每天和每个id分别)。如果BuyPrice或SellPrice为0,则应分配NaN。

目标输出应如下所示:

Id  row      Date         BuyPrice   SellPrice   Spread
1   1        2017-10-30    94520        0         NaN
1   1        2017-09-20    99100      99159       59
1   1        2017-09-20    98100      99190       90
2   1        2010-11-01    99890      100000      110

以下是我迄今为止尝试过的代码:

 df1 = df.groupby(['SID','Date'], sort=False) 
 df1['Spread'] =np.where((df['row']==1).eq(0).any(1),np.nan,df['SellPrice']-df['BuyPrice'])

但我收到此错误:

 ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>

1 个答案:

答案 0 :(得分:1)

如果您在此处显示行列。您不需要groupby,只需使用query按@cmaher建议过滤行。

df.query('row == 1').assign(Spread = 
                             df['SellPrice'].mask(df['SellPrice'].eq(0)) - 
                             df['BuyPrice'])

输出:

    Id  row        Date  BuyPrice  SellPrice      Time  Spread
0    1    1  2017-10-30     94520          0   9:00:00     NaN
5    1    1  2017-09-20     99100      99159   9:00:10    59.0
10   1    1  2017-09-20     98100      99190   9:01:10  1090.0
15   2    1  2010-11-01     99890     100000  10:00:02   110.0