我有一个这样的数据框,我正在尝试使用RESULT
,Set
和Subset
列上的分组依据来生成Subsubset
列。我尝试在perc
上返回idmax。
| Set | Subset | Subsubset | Class | perc | RESULT |
|-----|--------|-----------|-------|------|--------|
| 1 | A | 1 | good | 100 | good |
| 1 | A | | ok | 0 | good |
| 1 | A | | poor | 0 | good |
| 1 | A | | bad | 0 | good |
| 1 | A | 2 | good | 20 | bad |
| 1 | A | | ok | 10 | bad |
| 1 | A | | poor | 20 | bad |
| 1 | A | | bad | 50 | bad |
| 1 | A | 3 | good | 0 | poor |
| 1 | A | | ok | 10 | poor |
| 1 | A | | poor | 80 | poor |
| 1 | A | | bad | 10 | poor |
| 1 | B | 1 | good | 50 | good |
| 1 | B | | ok | 0 | good |
| 1 | B | | poor | 1 | good |
| 1 | B | | bad | 49 | good |
| 1 | B | 2 | good | 60 | good |
| 1 | B | | ok | 10 | good |
| 1 | B | | poor | 20 | good |
| 1 | B | | bad | 10 | good |
为澄清起见,结果将始终是单个值(例如,永远不会看到50/50的分割)。
设置数百个子集中的数字,直到ZZ(非常长的表格)。
这与类似的问题Python : Getting the Row which has the max value in groups using groupby不同,因为在这里我有兴趣查看MULTIPLE列上的分组。
答案 0 :(得分:2)
自从您提到idxmax
以来,我们就使用idxmax
idx=df.groupby(['Set','Subset','Subsubset'])['perc'].transform('idxmax')
df['RESULT']=df.loc[idx,'Class'].values#df.Class.reindex(idx).values