基于熊猫DataFrame df
,我进行了排名,可以在rank_df
中看到。
现在,我想创建一个新的DataFrame results
,它由三列["first", "second", "third"]
组成。此DataFrame应该用rank_df
的相应列名填充。例如,results
的第一行可能包含['ticker_3', 'ticker_1', 'ticker_4']
。换句话说,first
的第results
列应始终包含rank_df的列名,该列的排名最高。等等...
import numpy as np
import pandas as pd
np.random.seed(123)
cols = ["ticker_" + str(i + 1) for i in range(5)]
df = pd.DataFrame(np.random.rand(3, 5), columns=cols)
df
输出:
ticker_1 ticker_2 ticker_3 ticker_4 ticker_5
0 0.696469 0.286139 0.226851 0.551315 0.719469
1 0.423106 0.980764 0.684830 0.480932 0.392118
2 0.343178 0.729050 0.438572 0.059678 0.398044
生成rank_df:
rank_df = df.rank(axis=1, method="first", ascending=False)
rank_df
输出:
ticker_1 ticker_2 ticker_3 ticker_4 ticker_5
0 2.0 4.0 5.0 3.0 1.0
1 4.0 1.0 2.0 3.0 5.0
2 4.0 1.0 2.0 5.0 3.0
需要生成结果,
# NaNs in this final DataFrame needs to be filled with the respective column names
results = pd.DataFrame(None, index=rank_df.index, columns=["first", "second", "third"])
答案 0 :(得分:3)
IIUC,您可以尝试使用argsort
:
print(df)
ticker_1 ticker_2 ticker_3 ticker_4 ticker_5
0 0.548814 0.715189 0.602763 0.544883 0.423655
1 0.645894 0.437587 0.891773 0.963663 0.383442
2 0.791725 0.528895 0.568045 0.925597 0.071036
results[:] = df.columns.to_numpy()[np.argsort(-df)][:,:3] #change 3 to n as reqd
print(results)
first second third
0 ticker_2 ticker_3 ticker_1
1 ticker_4 ticker_3 ticker_1
2 ticker_4 ticker_1 ticker_3
答案 1 :(得分:2)
另一种方法是使用熊猫重塑:
rank_df.reset_index().melt('index').pivot('index', 'value', 'variable')\
.rename(columns={1.0:'first', 2.0:'second', 3.0:'third'}).iloc[:, :3]
输出:
value first second third
index
0 ticker_5 ticker_1 ticker_4
1 ticker_2 ticker_3 ticker_4
2 ticker_2 ticker_3 ticker_5