Question

我在熊猫中有以下数据框

// replace old code with:
return Promise.all(allData).then((data) => data.flat())

我要从此数据框中获得的是在所有不同地点年龄均在30岁以下和<35岁之间的所有雇员中的最高薪金

我想要的数据框是

employee_name   age location    salary
Harish          31  Mumbai      450000
Marina          30  Mumbai      600000
Meena           31  Pune        750000
Sachin          32  Mumbai      1200000
Tarun           27  Mumbai      1400000
Mahesh          41  Pune        1500000
Satish          42  Delhi       650000
Heena           34  Delhi       800000

我正在熊猫追随，但它给出了错误

employee_name       age     location     salary
Sachin              32      Mumbai       1200000
Meena               31      Pune         750000
Heena               34      Delhi        800000

如何在大熊猫中做到这一点？

Answer 1

您可以先过滤，然后找到具有最大值的行：

(df.loc[df['age'].between(31,34)]
   .sort_values('salary')
   .drop_duplicates('location', keep='last')
)

输出：

  employee_name  age location   salary
2         Meena   31     Pune   750000
7         Heena   34    Delhi   800000
3        Sachin   32   Mumbai  1200000

Answer 2

尝试使用idxmax，请注意此处的过滤器不起作用

df.loc[df[df['age'].between(31,34)].groupby('location')['salary'].idxmax()]
Out[110]: 
  employee_name  age location   salary
7         Heena   34    Delhi   800000
3        Sachin   32   Mumbai  1200000
2         Meena   31     Pune   750000

Answer 3

您可以尝试以下选项：

df = df.query('age > 30 & age < 35')
df = df.drop_duplicates(subset="age", keep="last")
print(df)

熊猫分组，过滤和汇总

3 个答案: