我有以下pandas dataframe
+---------+-------+
| Country | value |
+---------+-------+
| UK | 42 |
| US | 9 |
| US | 10 |
| France | 15 |
| France | 16 |
| Germany | 17 |
| Germany | 18 |
| Germany | 20 |
+---------+-------+
我想创建一个新列,根据从最大到最小的值的平均值对每个国家/地区进行排名
输出如下所示
+---------+-------+---------+------+
| Country | value | Average | Rank |
+---------+-------+---------+------+
| UK | 42 | 42 | 1 |
| US | 9 | 9.5 | 4 |
| US | 10 | 9.5 | 4 |
| France | 15 | 15.5 | 3 |
| France | 16 | 15.5 | 3 |
| Germany | 17 | 18 | 2 |
| Germany | 18 | 18 | 2 |
| Germany | 20 | 18 | 2 |
+---------+-------+---------+------+
请注意,我不需要平均列,它只是帮助解释。
非常感谢
答案 0 :(得分:6)
对mean
使用groupby
+ transform
,然后使用rank
:
df['Average'] = df.groupby('Country')['value'].transform('mean')
df['Rank'] = df['Average'].rank(method='dense', ascending=False)
print (df)
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0
类似的解决方案:
a = df.groupby('Country')['value'].transform('mean')
b = a.rank(method='dense', ascending=False)
df = df.assign(Average=a, Rank=b)
print (df)
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0
答案 1 :(得分:1)
<强>解决方案强>
在pd.DataFrame.join
与pd.concat
groupby
和mean
的组合
m = df.groupby('Country').value.mean()
df.join(
pd.concat([m, m.rank(ascending=False)], axis=1, keys=['Average', 'Rank']),
on='Country')
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0
同样,使用双join
m = df.groupby('Country').value.mean()
df.join(m.rename('Avergage'), on='Country') \
.join(m.rank(ascending=False).rename('Rank'), on='Country')
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0
或map
和assign
m = df.groupby('Country').value.mean()
df.assign(
Average=df.Country.map(m),
Rank=df.Country.map(m.rank(ascending=False))
)
Country value Average Rank
0 UK 42 42.000000 1.0
1 US 9 9.500000 4.0
2 US 10 9.500000 4.0
3 France 15 15.500000 3.0
4 France 16 15.500000 3.0
5 Germany 17 18.333333 2.0
6 Germany 18 18.333333 2.0
7 Germany 20 18.333333 2.0
答案 2 :(得分:1)
我使用现代方法链接方法来避免变异状态和创建新变量:
df = pd.DataFrame(
{'Country': ['Russia', 'Russia', 'USA'], 'Value': [12, 15, 16]})
df.join(df.groupby('Country').
mean().
rank().
rename(columns={'Value': 'Rank'}),
on='Country')