如何获取出现特定值的行?

时间:2017-10-20 11:47:41

标签: pandas dataframe group-by

我有orders_df

       Symbol Order  Shares
Date                           
2011-01-10   AAPL   BUY    1500
2011-01-13   AAPL  SELL    1500
2011-01-13    IBM   BUY    4000
2011-01-26   GOOG   BUY    1000
2011-02-02    XOM  SELL    4000
2011-02-10    XOM   BUY    4000
2011-03-03   GOOG  SELL    1000
2011-03-03    IBM  SELL    2200
2011-05-03    IBM   BUY    1500
2011-06-03    IBM  SELL    3300
2011-08-01   GOOG   BUY      55
2011-08-01   GOOG  SELL      55

我想要一个变量,将Date映射到该日期SELLS的数量。我还想要BUY的对称变量。

我尝试通过

为所有Orders做这件事
num_orders_per_day = orders_df.groupby(['Date']).size()

得到了:

Date
2011-01-10    1
2011-01-13    2
2011-01-26    1
2011-02-02    1
2011-02-10    1
2011-03-03    2
2011-05-03    1
2011-06-03    1
2011-08-01    2

但这不是理想的输出。

我想要的是sells_on_a_day

2011-01-13    1
2011-02-02    1
2011-03-03    2
2011-06-03    1
2011-08-01    1

然后是一个类似的buys_on_a_day变量。

1 个答案:

答案 0 :(得分:3)

首先按boolean indexing过滤,然后获取count

num_sells_per_day = orders_df[orders_df['Order'] == 'SELL']
                       .groupby(level=0).size().reset_index(name='count')
print (num_sells_per_day)
        Date  count
0 2011-01-13      1
1 2011-02-02      1
2 2011-03-03      2
3 2011-06-03      1
4 2011-08-01      1

替代:

num_sells_per_day = orders_df.query("Order == 'SELL'")
                             .groupby(level=0)
                             .size()
                             .reset_index(name='count')
print (num_sells_per_day)
        Date  count
0 2011-01-13      1
1 2011-02-02      1
2 2011-03-03      2
3 2011-06-03      1
4 2011-08-01      1

也可以一起创建2列,只有在缺少某些值时才会获得NaN

df1 = orders_df.groupby(['Date','Order']).size().unstack()
print (df1)
Order       BUY  SELL
Date                 
2011-01-10  1.0   NaN
2011-01-13  1.0   1.0
2011-01-26  1.0   NaN
2011-02-02  NaN   1.0
2011-02-10  1.0   NaN
2011-03-03  NaN   2.0
2011-05-03  1.0   NaN
2011-06-03  NaN   1.0
2011-08-01  1.0   1.0