vector
我希望我的输出看起来像这样(更改列名以区别于上面的数据帧):
test = pd.DataFrame({'injury':['A', 'B', 'B', 'A', 'A', 'C', 'A', 'B', 'A'], 'crash_drinking':[1, 1, 1, 0, 0, 0, 1, 0, 1], 'crash_drugs':[0,0,0,1,1,0,0,1,1], 'driver_drinking':[1,1,0,0,0,0,0,1,0], 'driver_drugged':[0,0,0,0,1,0,0,1,0]})
crash_drinking crash_drugs driver_drinking driver_drugged injury
0 1 0 1 0 A
1 1 0 1 0 B
2 1 0 0 0 B
3 0 1 0 0 A
4 0 1 0 1 A
5 0 0 0 0 C
6 1 0 0 0 A
7 0 1 1 1 B
8 1 1 0 0 A
第一行 drinking crash drinking driver in crash drugged crash drugged driver in crash
A 2 1 2 1
B 2 1 1 0
和以下过滤器的位置在哪里:
"喝酒崩溃"是"injury" = 'A'
和crash_drinking = 1
的计数;
"在车祸中喝车司机"是crash_drugs = 0
,crash_drinking = 1
,crash_drugs = 0
和driver_drinking = 1,
的位置;
"药物崩溃"是driver_drugs is 0
和crash_drinking = 0
"药物驱动程序崩溃"是crash_drugs = 1;
,crash_drinking = 0
,crash_drugs = 1
和driver_drinking = 0,
的位置。
B行相同,但driver_drugs = 1
现在我只是设置了一堆.loc过滤器:
"injury" = 'B'.
等
我宁愿通过groupby或.apply()这样做,因为我认为这比循环遍历所有这些查询要快。但我不确定这样做的正确语法。也许我应该对#34;伤害"做一个.groupby()。专栏,从那里开始......?
答案 0 :(得分:2)
result = pd.DataFrame()
result['drinking crash'] = (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)
result['drinking driver in crash'] = ((test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)
& (test['driver_drinking'] == 1) & (test['driver_drugs'] == 0))
result['drugged crash'] = (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)
result['drugged driver in crash'] = ((test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)
& (test['driver_drinking'] == 0) & (test['driver_drugs'] == 1))
result = result.astype(int)
result['injury'] = test['injury']
result.groupby('injury').sum()