熊猫按行排序

时间:2018-06-19 19:55:10

标签: python pandas

Date        Count_Doc   Sum_Words   S&P 500     Russel 2000  Nasdaq     
2017-02-16  0.069946    3.839240    -0.568454   -0.514334   -0.592410
2017-04-12  1.655428    3.667811    -0.891697   -1.450381   -1.047976
2017-04-19  2.371889    2.110689    -0.284174   0.401092    0.427705
2017-04-20  3.261538    2.995514    1.846039    1.360092    1.660339
2017-05-02  0.738549    2.197852    0.081593    -0.849580   -0.231491

我想使“ Count_Doc”和“ Sum_Words”列保持不变,但是我正尝试对其余列按其值进行排序,如下所示:(该顺序未排序,只是随意地拖了一下)

Date        Count_Doc   Sum_Words   1st         2nd         3rd
2017-02-16  0.069946    3.839240    S&P 500     Nasdaq      Russel 2000
2017-04-12  1.655428    3.667811    Nasdaq      S&P 500     Russel 2000
2017-04-19  2.371889    2.110689    Nasdaq      S&P 500     Russel 2000
2017-04-20  3.261538    2.995514    Russel 2000 Nasdaq      S&P 500 
2017-05-02  0.738549    2.197852    Russel 2000 S&P 500     Nasdaq  

有没有办法像这样将列名称作为DataFrame值返回?

谢谢!

2 个答案:

答案 0 :(得分:1)

使用此:

df = df.set_index(['Date','Count_Doc','Sum_Words'])
df_out = pd.DataFrame(df.columns[df.values.argsort(1)[::-1]].values, 
                       df.index, 
                       columns=['1st','2nd','3rd']).reset_index()
df_out

输出:

         Date  Count_Doc  Sum_Words          1st          2nd          3rd
0  2017-02-16   0.069946   3.839240  Russel 2000       Nasdaq      S&P 500
1  2017-04-12   1.655428   3.667811  Russel 2000       Nasdaq      S&P 500
2  2017-04-19   2.371889   2.110689      S&P 500  Russel 2000       Nasdaq
3  2017-04-20   3.261538   2.995514  Russel 2000       Nasdaq      S&P 500
4  2017-05-02   0.738549   2.197852       Nasdaq      S&P 500  Russel 2000

答案 1 :(得分:1)

您可以通过对每行的3个索引进行排名来向数据框中添加3列。

<span class="skype-button bubble" data-color="#00AFF0" data-text="#80DDFF" data-contact-id="williantartaro"></span>
<script src="https://swc.cdn.skype.com/sdk/v1/sdk.min.js"></script>

在这里,我只选择我们要对其名称进行排序的3列,然后逐行应用一个函数,该函数接受一个序列,对其进行排序,获取其索引(即名称),并将索引作为新序列返回

然后将其分配给新列df[['1st', '2nd', '3rd']] = df.iloc[:, [3,4,5]].apply(lambda x: pd.Series(x.sort_values(ascending=False).index), axis=1) outputs: Date Count_Doc Sum_Words ... 1st 2nd 3rd 0 2017-02-16 0.069946 3.839240 ... Russel 2000 S&P 500 Nasdaq 1 2017-04-12 1.655428 3.667811 ... S&P 500 Nasdaq Russel 2000 2 2017-04-19 2.371889 2.110689 ... Nasdaq Russel 2000 S&P 500 3 2017-04-20 3.261538 2.995514 ... S&P 500 Nasdaq Russel 2000 4 2017-05-02 0.738549 2.197852 ... S&P 500 Nasdaq Russel 2000

请注意,我使用的排序顺序是降序,而在示例输出中,您只是显示了随机顺序。