我有orders_df
:
Symbol Order Shares
Date
2011-01-10 AAPL BUY 1500
2011-01-13 AAPL SELL 1500
2011-01-13 IBM BUY 4000
2011-01-26 GOOG BUY 1000
2011-02-02 XOM SELL 4000
2011-02-10 XOM BUY 4000
2011-03-03 GOOG SELL 1000
2011-03-03 IBM SELL 2200
2011-05-03 IBM BUY 1500
2011-06-03 IBM SELL 3300
2011-08-01 GOOG BUY 55
2011-08-01 GOOG SELL 55
我想要一个变量,将Date
映射到该日期SELLS
的数量。我还想要BUY
的对称变量。
我尝试通过
为所有Orders
做这件事
num_orders_per_day = orders_df.groupby(['Date']).size()
得到了:
Date
2011-01-10 1
2011-01-13 2
2011-01-26 1
2011-02-02 1
2011-02-10 1
2011-03-03 2
2011-05-03 1
2011-06-03 1
2011-08-01 2
但这不是理想的输出。
我想要的是sells_on_a_day
:
2011-01-13 1
2011-02-02 1
2011-03-03 2
2011-06-03 1
2011-08-01 1
然后是一个类似的buys_on_a_day
变量。
答案 0 :(得分:3)
首先按boolean indexing
过滤,然后获取count
:
num_sells_per_day = orders_df[orders_df['Order'] == 'SELL']
.groupby(level=0).size().reset_index(name='count')
print (num_sells_per_day)
Date count
0 2011-01-13 1
1 2011-02-02 1
2 2011-03-03 2
3 2011-06-03 1
4 2011-08-01 1
替代:
num_sells_per_day = orders_df.query("Order == 'SELL'")
.groupby(level=0)
.size()
.reset_index(name='count')
print (num_sells_per_day)
Date count
0 2011-01-13 1
1 2011-02-02 1
2 2011-03-03 2
3 2011-06-03 1
4 2011-08-01 1
也可以一起创建2列,只有在缺少某些值时才会获得NaN
:
df1 = orders_df.groupby(['Date','Order']).size().unstack()
print (df1)
Order BUY SELL
Date
2011-01-10 1.0 NaN
2011-01-13 1.0 1.0
2011-01-26 1.0 NaN
2011-02-02 NaN 1.0
2011-02-10 1.0 NaN
2011-03-03 NaN 2.0
2011-05-03 1.0 NaN
2011-06-03 NaN 1.0
2011-08-01 1.0 1.0