我尝试在列中对值进行排名,并将排名分配给第一列['Tickers']
中的值。对于某些列,我希望将较小的值排名较高,而['Dividend']
会排名正常
并且优选地将这些等级存储在新的数据帧中
所以我要说我有这个数据框:
Ticker P/E P/S P/B P/FCF Dividend
No.
1 NTCT 457.32 3.03 1.44 26.04 -
2 GWRE 416.06 9.80 5.33 45.62 -
3 PEGA 129.02 4.41 9.85 285.10 0.0128
4 BLKB 87.68 4.96 14.36 41.81 0.0062
首先我用0
替换缺失值 df=df.replace('-',0)
然后我会对它们进行排名并创建新的数据框:
Ticker P/E Dividend
No.
1 NTCT 4 3
2 GWRE 3 3
3 PEGA 2 1
4 BLKB 1 2
我在考虑在列上使用scipy stats rankdata(即:rankdata(df['P/E'], method='ordinal')
),但它返回错误:
TypeError: '>' not supported between instances of 'int' and 'NavigableString'
答案 0 :(得分:1)
作为@Craig said in the comment,您可以使用DataFrame.rank(method='dense')方法:
df.Dividend = pd.to_numeric(df.Dividend, errors='coerce').fillna(1)
df[['Ticker']].join(df[['P/E','Dividend']].rank(method='dense'))
说明(一步一步):
In [35]: df
Out[35]:
Ticker P/E P/S P/B P/FCF Dividend
No.
1 NTCT 457.32 3.03 1.44 26.04 -
2 GWRE 416.06 9.80 5.33 45.62 -
3 PEGA 129.02 4.41 9.85 285.10 0.0128
4 BLKB 87.68 4.96 14.36 41.81 0.0062
In [36]: df.Dividend = pd.to_numeric(df.Dividend, errors='coerce').fillna(1)
In [37]: df
Out[37]:
Ticker P/E P/S P/B P/FCF Dividend
No.
1 NTCT 457.32 3.03 1.44 26.04 1.0000
2 GWRE 416.06 9.80 5.33 45.62 1.0000
3 PEGA 129.02 4.41 9.85 285.10 0.0128
4 BLKB 87.68 4.96 14.36 41.81 0.0062
In [38]: df[['Ticker']].join(df[['P/E','Dividend']].rank(method='dense'))
Out[38]:
Ticker P/E Dividend
No.
1 NTCT 4.0 3.0
2 GWRE 3.0 3.0
3 PEGA 2.0 2.0
4 BLKB 1.0 1.0
答案 1 :(得分:-1)