我有以下数据框:
Signature Genes Labels Scores Annotation
CELF1 AARS 0 -5.439356884 EMPTY
CELF1 AATF 0 -5.882719549 EMPTY
CELF1 ABCF1 0 -6.011462342 EMPTY
HNRNPC AARS 0 -6.166240409 EMPTY
HNRNPC AATF 0 -6.432658981 EMPTY
HNRNPC ABCF1 0 -6.476526092 EMPTY
FUS AARS 0 -5.646015964 EMPTY
FUS AATF 0 -6.224914841 EMPTY
FUS ABCF1 0 -6.395334389 EMPTY
我想基于“签名”列对“得分”列进行排名,基于“分数”列对“基因”进行排名,使得
Signature Genes Labels Scores Annotation Rank
CELF1 AARS 0 -5.439356884 EMPTY 1
CELF1 AATF 0 -5.882719549 EMPTY 2
CELF1 ABCF1 0 -6.011462342 EMPTY 3
HNRNPC AARS 0 -6.166240409 EMPTY 1
HNRNPC AATF 0 -6.432658981 EMPTY 2
HNRNPC ABCF1 0 -6.476526092 EMPTY 3
FUS AARS 0 -5.646015964 EMPTY 1
FUS AATF 0 -6.224914841 EMPTY 2
FUS ABCF1 0 -6.395334389 EMPTY 3
我根据this的帖子进行了追踪。我的代码是这样的:
data=pd.read_csv("trial1.csv",sep='\t')
data['max_score'] = data.groupby(['Signature','Genes'])['Scores'].transform('max').astype(float)
data['rank']=data.groupby('Signature')['max_score'].rank()
但是我的分数会根据绝对值进行排名,如下所示:
Signature Genes Labels Scores Annotation Rank
CELF1 ABCF1 0 -6.011462342 EMPTY 1
CELF1 AATF 0 -5.882719549 EMPTY 2
CELF1 AARS 0 -5.439356884 EMPTY 3
HNRNPC ABCF1 0 -6.476526092 EMPTY 1
HNRNPC AATF 0 -6.432658981 EMPTY 2
HNRNPC AARS 0 -6.166240409 EMPTY 3
FUS ABCF1 0 -6.395334389 EMPTY 1
FUS AATF 0 -6.224914841 EMPTY 2
FUS AARS 0 -5.646015964 EMPTY 3
答案 0 :(得分:2)
排名未按绝对值排序。它是按升序排序的,这是它的默认设置。您只需将对rank()
的呼叫更改为rank(ascending=False)
。请参见documentation。