GroupBy语句没有像字符串一样分组

时间:2018-04-29 10:31:43

标签: python pandas group-by mask

晚上,

我的数据:

display(dfRFQ_Breakdown_By_Done_Traded_Away_Grp.sort_values('security_type1',ascending=True))

    state     security_type1    count
0   Done             CORP           239
4   Tied Done        CORP            9
6   Tied Traded Away CORP            7
9   Traded Away      CORP          1075
1   Done             GOVT           40
5   Tied Done        GOVT           2
7   Tied Traded Away GOVT           16
10  Traded Away      GOVT          150
2   Done             MTGE           4
8   Tied Traded Away MTGE           3
11  Traded Away      MTGE           7
3   Done            SUPRA           31
12  Traded Away     SUPRA           88

我想将所有行分组为'完成'或者' Traded Away'为每个security_type1状态:

state     security_type1    count
Done        CORP             248
Traded Away CORP             1082
Done        GOVT             42
Traded Away GOVT             166
Done        MTGE             4
Traded Away MTGE             10
Done        SUPRA            31
Traded Away SUPRA            88

我的代码:

# Updating any Tied Done to Done and Tied Traded Away to Traded Away  
mask = (dfRFQ_Breakdown_By_Done_Traded_Away_Grp['state'].str.contains('Tied Done'))       
dfRFQ_Breakdown_By_Done_Traded_Away_Grp.loc[mask, 'state'] = 'Done'

mask = (dfRFQ_Breakdown_By_Done_Traded_Away_Grp['state'].str.contains('Tied Traded Away'))       
dfRFQ_Breakdown_By_Done_Traded_Away_Grp.loc[mask, 'state'] = 'Traded Away'
display(dfRFQ_Breakdown_By_Done_Traded_Away_Grp.sort_values('security_type1',ascending=True))

看来更新的字符串是由pandas单独分组的:

state   security_type1  count
Done         CORP        239
Done         CORP        9
Traded Away  CORP        7
Traded Away  CORP        1075
Done         GOVT        40
Done         GOVT        2
Traded Away  GOVT        16
Traded Away  GOVT        150
Done         MTGE        4
Traded Away  MTGE        3
Traded Away  MTGE        7
Done         SUPRA       31
Traded Away  SUPRA       88

对于大熊猫的反应是什么,没有将Done和Traded Away的实例结合在一起?我是否需要创建数据帧的另一个副本。它几乎像大熊猫在更新之前有一个旧值的链接。

1 个答案:

答案 0 :(得分:1)

这似乎可以通过querygroupbysort_values

来实现
res = df.query('(state == "Done") | (state == "TradedAway")')\
        .groupby(['state', 'security_type1'], as_index=False)['count'].sum()\
        .sort_values(['security_type1', 'state'])

print(res)

        state security_type1  count
0        Done           CORP    239
4  TradedAway           CORP   1075
1        Done           GOVT     40
5  TradedAway           GOVT    150
2        Done           MTGE      4
6  TradedAway           MTGE      7
3        Done          SUPRA     31
7  TradedAway          SUPRA     88