是否可以做这样的事情
df = pd.DataFrame({
"sort_by": ["a","a","a","a","b","b","b", "a"],
"x": [100.5,200,200,500,1,2,3, 200],
"y": [4000,2000,2000,1000,500.5,600.5,600.5, 100.5]
})
df = df.sort_values(by=["x","y"], ascending=False)
在这里我可以按sort_by列排序,并使用x和y查找排名(使用y打破平局)
所以理想的前景将会
sort_by x y rank
a 500 1000 1
a 200 2000 2
a 200 2000 2
a 200 100.5 3
a 100.5 4000 4
b 3 600.5 1
b 2 600.5 2
b 1 500.5 3
答案 0 :(得分:1)
在factorize
之后检查sort_values
df = df.sort_values(by=["x","y"], ascending=False)
df['rank']=tuple(zip(df.x,df.y))
df['rank']=df.groupby('sort_by',sort=False)['rank'].apply(lambda x : pd.Series(pd.factorize(x)[0])).values
df
Out[615]:
sort_by x y rank
3 a 500.0 1000.0 1
1 a 200.0 2000.0 2
2 a 200.0 2000.0 2
7 a 200.0 100.5 3
0 a 100.5 4000.0 4
6 b 3.0 600.5 1
5 b 2.0 600.5 2
4 b 1.0 500.5 3