我想要为每个团队提供包含前三名得分手的数据框的行。
在我的脑海中,它是Dataframe.nlargest()
和Dataframe.groupby()
的组合,但我认为这不受支持。
我理想的解决方案是:
df
上执行,而无需创建其他数据框import pandas as pd
df = pd.read_json('{"team":{"0":"A","1":"A","2":"A","3":"A","4":"A","5":"B","6":"B","7":"B","8":"B","9":"B","10":"C","11":"C","12":"C","13":"C","14":"C"},"player":{"0":"Alice","1":"Becky","2":"Carmen","3":"Donna","4":"Elizabeth","5":"Fran","6":"Greta","7":"Heather","8":"Iris","9":"Jackie","10":"Kelly","11":"Lucy","12":"Molly","13":"Nina","14":"Ophelia"},"points":{"0":15,"1":11,"2":13,"3":8,"4":10,"5":28,"6":29,"7":18,"8":25,"9":9,"10":12,"11":23,"12":18,"13":10,"14":15}}')
| team | player | points |
|------|-----------|--------|
| A | Alice | 15 |
| A | Becky | 11 |
| A | Carmen | 13 |
| A | Donna | 8 |
| A | Elizabeth | 10 |
| B | Fran | 28 |
| B | Greta | 29 |
| B | Heather | 18 |
| B | Iris | 25 |
| B | Jackie | 9 |
| C | Kelly | 12 |
| C | Lucy | 23 |
| C | Molly | 18 |
| C | Nina | 10 |
| C | Ophelia | 15 |
df_output = pd.read_json('{"team":{"0":"A","1":"A","2":"A","3":"B","4":"B","5":"B","6":"C","7":"C","8":"C"},"player":{"0":"Alice","1":"Becky","2":"Carmen","3":"Fran","4":"Greta","5":"Iris","6":"Lucy","7":"Molly","8":"Ophelia"},"points":{"0":15,"1":11,"2":13,"3":28,"4":29,"5":25,"6":23,"7":18,"8":15}}')
df_output
| team | player | points |
|------|---------|--------|
| A | Alice | 15 |
| A | Becky | 11 |
| A | Carmen | 13 |
| B | Fran | 28 |
| B | Greta | 29 |
| B | Iris | 25 |
| C | Lucy | 23 |
| C | Molly | 18 |
| C | Ophelia | 15 |
答案 0 :(得分:2)
您可以使用 df.groupby.rank
方法:
In [1401]: df[df.groupby('team')['points'].rank(ascending=False) <= 3]
Out[1401]:
team player points
0 A Alice 15
1 A Becky 11
2 A Carmen 13
5 B Fran 28
6 B Greta 29
8 B Iris 25
11 C Lucy 23
12 C Molly 18
14 C Ophelia 15
答案 1 :(得分:2)
您可以将df.groupby
与df.nlargest
一起使用
df.groupby('team').apply(lambda x:x.nlargest(3,'points')).reset_index(drop=True)
team player points
0 A Alice 15
1 A Carmen 13
2 A Becky 11
3 B Greta 29
4 B Fran 28
5 B Iris 25
6 C Lucy 23
7 C Molly 18
8 C Ophelia 15
答案 2 :(得分:2)
类似的方法可能有用-
df.loc[df.groupby(['team'])['points'].nlargest(3).reset_index().drop(['team','points'], axis=1)['level_1'].values]
team player points
0 A Alice 15
2 A Carmen 13
1 A Becky 11
6 B Greta 29
5 B Fran 28
8 B Iris 25
11 C Lucy 23
12 C Molly 18
14 C Ophelia 15
答案 3 :(得分:2)
另一种方法是sort_values
和groupby().tail/head
:
df.sort_values('points').groupby('team').tail(3)
输出:
team player points
1 A Becky 11
2 A Carmen 13
0 A Alice 15
14 C Ophelia 15
12 C Molly 18
11 C Lucy 23
8 B Iris 25
5 B Fran 28
6 B Greta 29
或
df.sort_values('points', ascending=False).groupby('team').head(3)
输出:
team player points
6 B Greta 29
5 B Fran 28
8 B Iris 25
11 C Lucy 23
12 C Molly 18
0 A Alice 15
14 C Ophelia 15
2 A Carmen 13
1 A Becky 11