pandas:通过groupby进行复杂过滤

时间:2016-10-21 18:40:36

标签: python pandas

vector

我希望我的输出看起来像这样(更改列名以区别于上面的数据帧):

test = pd.DataFrame({'injury':['A', 'B', 'B', 'A', 'A', 'C', 'A', 'B', 'A'], 'crash_drinking':[1, 1, 1, 0, 0, 0, 1, 0, 1], 'crash_drugs':[0,0,0,1,1,0,0,1,1], 'driver_drinking':[1,1,0,0,0,0,0,1,0], 'driver_drugged':[0,0,0,0,1,0,0,1,0]})

   crash_drinking  crash_drugs  driver_drinking  driver_drugged injury
0               1            0                1               0      A
1               1            0                1               0      B
2               1            0                0               0      B
3               0            1                0               0      A
4               0            1                0               1      A
5               0            0                0               0      C
6               1            0                0               0      A
7               0            1                1               1      B
8               1            1                0               0      A

第一行 drinking crash drinking driver in crash drugged crash drugged driver in crash A 2 1 2 1 B 2 1 1 0 和以下过滤器的位置在哪里:

"喝酒崩溃"是"injury" = 'A'crash_drinking = 1的计数;

"在车祸中喝车司机"是crash_drugs = 0crash_drinking = 1crash_drugs = 0driver_drinking = 1,的位置;

"药物崩溃"是driver_drugs is 0crash_drinking = 0

的地方

"药物驱动程序崩溃"是crash_drugs = 1;crash_drinking = 0crash_drugs = 1driver_drinking = 0,的位置。

B行相同,但driver_drugs = 1

除外

现在我只是设置了一堆.loc过滤器:

"injury" = 'B'.

我宁愿通过groupby或.apply()这样做,因为我认为这比循环遍历所有这些查询要快。但我不确定这样做的正确语法。也许我应该对#34;伤害"做一个.groupby()。专栏,从那里开始......?

1 个答案:

答案 0 :(得分:2)

result = pd.DataFrame()
result['drinking crash'] = (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)
result['drinking driver in crash'] = ((test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) 
                                      & (test['driver_drinking'] == 1) & (test['driver_drugs'] == 0))
result['drugged crash'] = (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)
result['drugged driver in crash'] = ((test['crash_drinking'] == 0) & (test['crash_drugs'] == 1) 
                                     & (test['driver_drinking'] == 0) & (test['driver_drugs'] == 1))
result = result.astype(int)
result['injury'] = test['injury']
result.groupby('injury').sum()

resulting dataframe