Question

我从像

这样的数据框开始

print(df)
                   int          float  _i
1                    2   2.000000e+00   1
3                    3   3.000000e+00   3
2                    3   4.000000e+00   2
4 -9223372036854775808 -1.797693e+308   4
0 -9223372036854775808   1.000000e+00   0

如果我使用sort_values按两列排序，我会得到您在下面看到的输出。所以sort_values似乎什么都不做。如果我只有一个列名它可以工作，我使用它的方式适用于以前的pandas版本。大熊猫有什么变化我不知道吗？

print(df.sort_values(["int", "float"]))
                   int          float  _i
1                    2   2.000000e+00   1
3                    3   3.000000e+00   3
2                    3   4.000000e+00   2
4 -9223372036854775808 -1.797693e+308   4
0 -9223372036854775808   1.000000e+00   0

在pandas 0.17.0中，我得到了：

print(df.sort_values(["int", "float"]))
                   int          float  _i
4 -9223372036854775808 -1.797693e+308   4
0 -9223372036854775808   1.000000e+00   0
1                    2   2.000000e+00   1
3                    3   3.000000e+00   3
2                    3   4.000000e+00   2

Answer 1

最近的 pandas 版本不再显示此错误，不久前已修复：https://github.com/pandas-dev/pandas/commit/6bea8275e504a594ac4fee71b5c941fb520c8b1a

Answer 2

我可以通过以下方式调用您的案例来获得您想要的排序：

print(df.sort_values(by=["int", "float"], na_position='first'))

                   int          float  _i
3 -9223372036854775808 -1.797693e+308   4
4 -9223372036854775808   1.000000e+00   0
0                    2   2.000000e+00   1
1                    3   3.000000e+00   3
2                    3   4.000000e+00   2

但是，我不确定为什么两个版本之间的排序行为不同。我检查了GitHub源代码，我没有看到这两个版本之间sort_values函数的任何更改。可能是代码中更深层次的内容发生了变化。

进行排序的代码：

2968                if len(by) > 1:
2968                from pandas.core.groupby import _lexsort_indexer
2969    
2970                def trans(v):
2971                    if com.needs_i8_conversion(v):
2972                        return v.view('i8')
2973                    return v
2974                keys = []
2975                for x in by:
2976                    k = self[x].values
2977                    if k.ndim == 2:
2978                        raise ValueError('Cannot sort by duplicate column %s' % str(x))
2979                    keys.append(trans(k))
2980                indexer = _lexsort_indexer(keys, orders=ascending,
2981                                           na_position=na_position)
2982                indexer = com._ensure_platform_int(indexer)

3004        new_data = self._data.take(indexer, axis=self._get_block_manager_axis(axis),
3005                                       convert=False, verify=False)

_lexsort_indexer（）或self._data.take（）的内容可能已更改。

sort_by在熊猫中打破＆gt; = 0.18.0？

2 个答案: