Question

我需要找到最快的方法来对数据帧中的每一行进行排序，其中包含数百万行和大约一百列。

这样的事情：

A   B   C   D
3   4   8   1
9   2   7   2

需要成为：

A   B   C   D
8   4   3   1
9   7   2   2

现在我正在对每行应用sort并逐行构建新的数据帧。我还要为每一行做一些额外的，不那么重要的事情（因此我为什么要使用熊猫而不是numpy）。是否可以更快地创建列表列表，然后立即构建新的数据帧？或者我需要进行cython吗？

Answer 1

我想我会在numpy中这样做：

In [11]: a = df.values

In [12]: a.sort(axis=1)  # no ascending argument

In [13]: a = a[:, ::-1]  # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
       [9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
   A  B  C  D
0  8  4  3  1
1  9  7  2  2

我原本以为这可能有用，但它对列进行排序：

In [21]: df.sort(axis=1, ascending=False)
Out[21]:
   D  C  B  A
0  1  8  4  3
1  2  7  2  9

啊，熊猫加注：

In [22]: df.sort(df.columns, axis=1, ascending=False)

ValueError：按列排序时，轴必须为0（行）

Answer 2

要添加@Andy-Hayden给出的答案，要在整个框架内进行此操作......不确定为什么这样可行，但确实如此。似乎没有对订单的控制。

    In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

    In [98]: A
    Out[98]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [99]: A.values.sort
    Out[99]: <function ndarray.sort>

    In [100]: A
    Out[100]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [101]: A.values.sort()

    In [102]: A
    Out[102]: 
    one  two  three  four  five
    0   22   46     49    63    72
    1   25   30     33    43    69
    2   21   24     39    56    93
    3    3   11     52    57    74
    In [103]: A = A.iloc[:,::-1]

    In [104]: A
    Out[104]: 
    five  four  three  two  one
    0    72    63     49   46   22
    1    69    43     33   30   25
    2    93    56     39   24   21
    3    74    57     52   11    3

我希望有人可以解释为什么会这样，只是感到高兴它有效8）

Answer 3

您可以使用pd.apply。

<div class="row">
    <div class="col-xs-6">
        <div class="form-group">
            <input type="text" class="form-control form-control-sm" />
        </div>
    </div>
    <div class="col-xs-6">
        <div class="form-group">
            <select class="form-control form-control-sm">
                <option>select</option>
            </select>
        </div>
    </div>
</div>

由于您希望它按降序排列，您可以简单地将数据帧与-1相乘并对其进行排序。

Eg:

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five']) 
print (A)

   one  two  three  four  five
0    2   75     44    53    46
1   18   51     73    80    66
2   35   91     86    44    25
3   60   97     57    33    79

A = A.apply(np.sort, axis = 1) 
print(A)

   one  two  three  four  five
0    2   44     46    53    75
1   18   51     66    73    80
2   25   35     44    86    91
3   33   57     60    79    97

Answer 4

代替使用pd.DataFrame构造函数，更简单的将排序后的值分配回去的方法是使用双括号：

原始数据框：

A   B   C   D
3   4   8   1
9   2   7   2

df[['A', 'B', 'C', 'D']] = np.sort(df)[:, ::-1]

   A  B  C  D
0  8  4  3  1
1  9  7  2  2

这样，您还可以对部分列进行排序：

df[['B', 'C']] = np.sort(df[['B', 'C']])[:, ::-1]

   A  B  C  D
0  3  8  4  1
1  9  7  2  2

Answer 5

可以尝试使用这种方法来保持df的完整性：

import pandas as pd 
import numpy as np

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five']) 
print (A) 
print(type(A))

   one  two  three  four  five
0   85   27     64    50    55
1    3   90     65    22     8
2    0    7     64    66    82
3   58   21     42    27    30
<class 'pandas.core.frame.DataFrame'>

B = A.apply(lambda x: np.sort(x), axis=1, raw=True) 
print(B) 
print(type(B))

   one  two  three  four  five
0   27   50     55    64    85
1    3    8     22    65    90
2    0    7     64    66    82
3   21   27     30    42    58
<class 'pandas.core.frame.DataFrame'>

在pandas数据框中对每一行进行排序的最快方法

5 个答案: