当未应用kind参数时,Panda的sort_values使用的排序算法

时间:2017-05-26 15:54:04

标签: python sorting pandas dataframe

在熊猫' print incremented value is 3 Returning 42 t3 returned 42 方法,sort_values参数仅在对单个列或标签进行排序时应用。为什么这样,以及在未应用kind参数的情况下使用什么排序算法?它是稳定的吗?

(有关文档,请参阅https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html。)

1 个答案:

答案 0 :(得分:4)

这是docstring from the source file,声明get_group_index_sorter(group_index, ngroups)

algos.groupsort_indexer implements `counting sort` and it is at least
O(ngroups), where
    ngroups = prod(shape)
    shape = map(len, keys)
that is, linear in the number of combinations (cartesian product) of unique
values of groupby keys. This can be huge when doing multi-key groupby.
np.argsort(kind='mergesort') is O(count x log(count)) where count is the
length of the data-frame;
Both algorithms are `stable` sort and that is necessary for correctness of
groupby operations. e.g. consider:
    df.groupby(key)[col].transform('first')

PS这里是一个“调用链”:

pandas.core.frame.DataFrame.sort_values() -> \
  pandas.core.sorting.lexsort_indexer() ->  \
    pandas.core.sorting.indexer_from_factorized() -> \
      pandas.core.sorting.get_group_index_sorter()