如何在不评估对角线值的情况下为pd.Dataframe中的每一行返回N个最大数字?

时间:2015-11-27 13:34:22

标签: python correlation

假设我有一个df:

     c1   c2    c3   c4    c5
c1   1    10    16   0.5   7
c2   11   1     1.3  8     6
c3   12   12    1    4     2
c4   3    0.4   2    1     9    
c5   4    7     2    0.9   1

我可以在不评估对角线值的情况下返回3个最高的邻域,即

[c1] [c1],[c2] [c2]等。

我希望结果:

For c1, the 3 best are c1c2, c1c3 and c1c5

For c2, the 3 best are c2c1, c2c4, and c2c5

For c3, the 3 best are c3c1, c3c2, and c3c4

.
.
.

1 个答案:

答案 0 :(得分:0)

In [18]: r = [[1, 10, 16, 0.5, 7], [11, 1, 1.3, 8, 6], [12, 12, 1, 4, 2], [3, 0.4, 2, 1, 9], [4, 7, 2, 0.9, 1]]
    ...: df = pd.DataFrame(r)
    ...: 

In [19]: a = df.values
    ...: a.sort(axis=1)
    ...: 

In [20]: sorted_values = a[:, -3::]

In [21]: sorted_values
Out[21]: 
array([[  7.,  10.,  16.],
       [  6.,   8.,  11.],
       [  4.,  12.,  12.],
       [  2.,   3.,   9.],
       [  2.,   4.,   7.]])

In [22]: ##or in reverse
    ...: sorted_values[:, ::-1]
Out[22]: 
array([[ 16.,  10.,   7.],
       [ 11.,   8.,   6.],
       [ 12.,  12.,   4.],
       [  9.,   3.,   2.],
       [  7.,   4.,   2.]])