熊猫多重条件分组

时间:2020-02-14 22:51:03

标签: python pandas

如何实现对多个条件的分组。例如:

列CL行== a,b,c按A和C列分组[[TOTAL] .min()和列CL行== d,e,f列按B分组[TOTAL] .min()

CL  | A | B | C | TOTAL
a   | 1 | 6 | 5 | 125,000
b   | 2 | 5 | 5 | 140,000
c   | 3 | 4 | 5 | 148,000
d   | 4 | 3 | 6 | 125,000
e   | 5 | 2 | 6 | 136,000
f   | 6 | 1 | 6 | 156,000

original table

2 个答案:

答案 0 :(得分:0)

好,我在这里看到2个选项吗?

(1)您分别进行分组和聚合,然后将其合并回去:

pd.concat([df.loc[df["CL"].isin(["a", "b", "c"])].groupby(["A", "C"])["TOTAL"].min(),
df.loc[df["CL"].isin(["d", "e", "f"])].groupby("B")["TOTAL"].min()])

输出:

(1, 5)    125000
(2, 5)    140000
(3, 5)    148000
1         156000
2         136000
3         125000
Name: TOTAL, dtype: int64

(2)或者-您需要组成一个虚拟分组密钥-例如,您可以通过用-1屏蔽不需要的分组密钥来做到这一点,因此:

import numpy as np

#using the copy so original df won't be amended:
df2=df.copy()

#mask unwanted grouping keys by any object, other than None (None-s are automatically excluded from the group)
#choose key, so it won't get mixed up with any of other grouping keys

df2["B"]=np.where(df["CL"].isin(["a", "b", "c"]), -1, df["B"])
df2["A"]=np.where(df["CL"].isin(["a", "b", "c"]), df["A"], -1)
df2["C"]=np.where(df["CL"].isin(["a", "b", "c"]), df["C"], -1)

df2.groupby(["A", "B", "C"])["TOTAL"].min()

输出:

A   B   C
-1   1  -1    156000
     2  -1    136000
     3  -1    125000
 1  -1   5    125000
 2  -1   5    140000
 3  -1   5    148000
Name: TOTAL, dtype: int64

答案 1 :(得分:0)

enter image description here我最终通过添加额外的列“ test”和以下代码来解决了问题:

z['test'] = np.where(z['ACTIVITY_PHASE'].isin(['FRAC','COIL']), z['TOTAL'], 
                (np.where(z['ACTIVITY_PHASE']=='PREW', z.groupby(z['ACTIVITY_PHASE']=='PREW')['TOTAL'].transform('min'), 
                (np.where(z['ACTIVITY_PHASE']=='WINF', z.groupby(z['ACTIVITY_PHASE']=='WINF')['TOTAL'].transform('min'), 
                (np.where(z['ACTIVITY_PHASE']=='WOR', z.groupby(z['ACTIVITY_PHASE']=='WOR')['TOTAL'].transform('min'), 0)))))))