如何在Python 3.x中获得类似2.x的排序行为?

时间:2014-10-26 16:27:10

标签: python python-3.x sorting python-2.x

我正在尝试复制(如果可能的话)改进Python 2.x在3.x中的排序行为,以便像intfloat等可互相订购的类型按预期排序和相互不可共享的类型在输出中分组。

这是我正在谈论的一个例子:

>>> sorted([0, 'one', 2.3, 'four', -5])  # Python 2.x
[-5, 0, 2.3, 'four', 'one']
>>> sorted([0, 'one', 2.3, 'four', -5])  # Python 3.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < int()

我以前的尝试,使用类sorted()的密钥参数(参见 <{3}})从根本上被打破了,因为它的方法是

  1. 尝试比较值,
  2. 如果失败,则回退到比较其类型的字符串表示
  3. Why does this key class for sorting heterogeneous sequences behave oddly?所述,

    会导致不及物处理。

    一种天真的方法,我最初在没有尝试编码的情况下拒绝,将使用返回(type, value)元组的键函数:

    def motley(value):
        return repr(type(value)), value
    

    然而,这并不是我想要的。首先,它打破了相互可订购类型的自然顺序:

    >>> sorted([0, 123.4, 5, -6, 7.89])
    [-6, 0, 5, 7.89, 123.4]
    >>> sorted([0, 123.4, 5, -6, 7.89], key=motley)
    [7.89, 123.4, -6, 0, 5]
    

    其次,当输入包含两个具有相同本质不可共享类型的对象时,它会引发异常:

    >>> sorted([{1:2}, {3:4}], key=motley)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unorderable types: dict() < dict()
    

    ...这无疑是Python 2.x和3.x中的标准行为 - 但理想情况下我希望将这些类型组合在一起(我并不特别关心它们的排序,但它看起来似乎与Python保证稳定排序保持原始顺序一致。

    我可以解决数字类型的第一个问题,特别是它们:

    from numbers import Real
    from decimal import Decimal
    
    def motley(value):
        numeric = Real, Decimal
        if isinstance(value, numeric):
            typeinfo = numeric
        else:
            typeinfo = type(value)
        return repr(typeinfo), value
    

    ......尽可能有效:

    >>> sorted([0, 'one', 2.3, 'four', -5], key=motley)
    [-5, 0, 2.3, 'four', 'one']
    

    ...但是没有考虑到可能存在其他可以相互订购的不同(可能是用户定义的)类型的事实,当然仍然会因本质上无法解决的类型而失败:

    >>> sorted([{1:2}, {3:4}], key=motley)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unorderable types: dict() < dict()
    

    是否有另一种方法可以解决两者任意,不同但相互可订购类型的问题本质上无法解决的类型?

10 个答案:

答案 0 :(得分:32)

愚蠢的想法:制作第一个传递来划分可以在彼此之间进行比较的组中的所有不同项目,对各个组进行排序并最终将它们连接起来。我假设一个项目与一个组的所有成员相当,如果它与一个组的第一个成员相当。像这样的东西(Python3):

import itertools

def python2sort(x):
    it = iter(x)
    groups = [[next(it)]]
    for item in it:
        for group in groups:
            try:
                item < group[0]  # exception if not comparable
                group.append(item)
                break
            except TypeError:
                continue
        else:  # did not break, make new group
            groups.append([item])
    print(groups)  # for debugging
    return itertools.chain.from_iterable(sorted(group) for group in groups)

在可悲的情况下,这将具有二次运行时间,没有任何项目具有可比性,但我想知道这一点的唯一方法是检查所有可能的组合。对于任何试图对一长串无法解决的项目(如复数)进行排序的人来说,将二次行为视为应得的惩罚。在一些混合了一些字符串和一些整数的情况下,速度应该与正常排序的速度相似。快速测试:

In [19]: x = [0, 'one', 2.3, 'four', -5, 1j, 2j,  -5.5, 13 , 15.3, 'aa', 'zz']

In [20]: list(python2sort(x))
[[0, 2.3, -5, -5.5, 13, 15.3], ['one', 'four', 'aa', 'zz'], [1j], [2j]]
Out[20]: [-5.5, -5, 0, 2.3, 13, 15.3, 'aa', 'four', 'one', 'zz', 1j, 2j]

这似乎是一种稳定的排序&#39;同样,由于这些组是按照遇到无法比较的物品的顺序形成的。

答案 1 :(得分:30)

这个答案旨在忠实地在Python 3中重新创建Python 2排序顺序。

实际的Python 2实现非常复杂,但object.c's default_3way_compare在实例有机会实现正常的比较规则后做最后的回退。这是在个别类型有机会进行比较之后(通过__cmp____lt__挂钩)。

在包装器中将该函数实现为纯Python,并且模拟规则的异常(dict和特定的复数)在Python 3中为我们提供了相同的Python 2排序语义:

from numbers import Number


# decorator for type to function mapping special cases
def per_type_cmp(type_):
    try:
        mapping = per_type_cmp.mapping
    except AttributeError:
        mapping = per_type_cmp.mapping = {}
    def decorator(cmpfunc):
        mapping[type_] = cmpfunc
        return cmpfunc
    return decorator


class python2_sort_key(object):
    _unhandled_types = {complex}

    def __init__(self, ob):
       self._ob = ob

    def __lt__(self, other):
        _unhandled_types = self._unhandled_types
        self, other = self._ob, other._ob  # we don't care about the wrapper

        # default_3way_compare is used only if direct comparison failed
        try:
            return self < other
        except TypeError:
            pass

        # hooks to implement special casing for types, dict in Py2 has
        # a dedicated __cmp__ method that is gone in Py3 for example.
        for type_, special_cmp in per_type_cmp.mapping.items():
            if isinstance(self, type_) and isinstance(other, type_):
                return special_cmp(self, other)

        # explicitly raise again for types that won't sort in Python 2 either
        if type(self) in _unhandled_types:
            raise TypeError('no ordering relation is defined for {}'.format(
                type(self).__name__))
        if type(other) in _unhandled_types:
            raise TypeError('no ordering relation is defined for {}'.format(
                type(other).__name__))

        # default_3way_compare from Python 2 as Python code
        # same type but no ordering defined, go by id
        if type(self) is type(other):
            return id(self) < id(other)

        # None always comes first
        if self is None:
            return True
        if other is None:
            return False

        # Sort by typename, but numbers are sorted before other types
        self_tname = '' if isinstance(self, Number) else type(self).__name__
        other_tname = '' if isinstance(other, Number) else type(other).__name__

        if self_tname != other_tname:
            return self_tname < other_tname

        # same typename, or both numbers, but different type objects, order
        # by the id of the type object
        return id(type(self)) < id(type(other))


@per_type_cmp(dict)
def dict_cmp(a, b, _s=object()):
    if len(a) != len(b):
        return len(a) < len(b)
    adiff = min((k for k in a if a[k] != b.get(k, _s)), key=python2_sort_key, default=_s)
    if adiff is _s:
        # All keys in a have a matching value in b, so the dicts are equal
        return False
    bdiff = min((k for k in b if b[k] != a.get(k, _s)), key=python2_sort_key)
    if adiff != bdiff:
        return python2_sort_key(adiff) < python2_sort_key(bdiff)
    return python2_sort_key(a[adiff]) < python2_sort_key(b[bdiff])

我合并了handling dictionary sorting as implemented in Python 2,因为类型本身通过__cmp__挂钩支持。我自然也坚持使用Python 2对键和值的排序。

我还为复数添加了特殊的大小写,因为当你尝试对它们进行排序时Python 2会引发异常:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers

如果要完全模拟Python 2行为,可能需要添加更多特殊情况。

如果你想对复杂数字进行排序,你需要始终将它们与非数字组放在一起; e.g:

# Sort by typename, but numbers are sorted before other types
if isinstance(self, Number) and not isinstance(self, complex):
    self_tname = ''
else:
    self_tname = type(self).__name__
if isinstance(other, Number) and not isinstance(other, complex):
    other_tname = ''
else:
    other_tname = type(other).__name__

一些测试用例:

>>> sorted([0, 'one', 2.3, 'four', -5], key=python2_sort_key)
[-5, 0, 2.3, 'four', 'one']
>>> sorted([0, 123.4, 5, -6, 7.89], key=python2_sort_key)
[-6, 0, 5, 7.89, 123.4]
>>> sorted([{1:2}, {3:4}], key=python2_sort_key)
[{1: 2}, {3: 4}]
>>> sorted([{1:2}, None, {3:4}], key=python2_sort_key)
[None, {1: 2}, {3: 4}]

答案 2 :(得分:9)

这里没有运行Python 3,但也许这样的东西可行。测试一下&#34;是否小于&#34;比较&#34;价值&#34;创建一个例外,然后做一些事情&#34;处理这种情况,比如将其转换为字符串。

如果列表中的其他类型不是同一类型但可以互相订购,那么您仍然需要更多特殊处理。

from numbers import Real
from decimal import Decimal

def motley(value):
    numeric = Real, Decimal
    if isinstance(value, numeric):
        typeinfo = numeric
    else:
        typeinfo = type(value)

    try:
        x = value < value
    except TypeError:
        value = repr(value)

    return repr(typeinfo), value

>>> print sorted([0, 'one', 2.3, 'four', -5, (2+3j), (1-3j)], key=motley)
[-5, 0, 2.3, (1-3j), (2+3j), 'four', 'one']

答案 3 :(得分:1)

为了避免使用异常并使用基于类型的解决方案,我提出了这个:

#! /usr/bin/python3

import itertools

def p2Sort(x):
    notImpl = type(0j.__gt__(0j))
    it = iter(x)
    first = next(it)
    groups = [[first]]
    types = {type(first):0}
    for item in it:
        item_type = type(item)
        if item_type in types.keys():
            groups[types[item_type]].append(item)
        else:
            types[item_type] = len(types)
            groups.append([item])

    #debuggng
    for group in groups:
        print(group)
        for it in group:
            print(type(it),)
    #

    for i in range(len(groups)):
        if type(groups[i][0].__gt__(groups[i][0])) == notImpl:
            continue
        groups[i] = sorted(groups[i])

    return itertools.chain.from_iterable(group for group in groups)

x = [0j, 'one', 2.3, 'four', -5, 3j, 0j,  -5.5, 13 , 15.3, 'aa', 'zz']
print(list(p2Sort(x)))

请注意,需要一个额外的字典来保存列表中的不同类型和一个类型保持变量(notImpl)。进一步注意,浮法和整体在这里不混合。

输出:

================================================================================
05.04.2017 18:27:57
~/Desktop/sorter.py
--------------------------------------------------------------------------------
[0j, 3j, 0j]
<class 'complex'>
<class 'complex'>
<class 'complex'>
['one', 'four', 'aa', 'zz']
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
[2.3, -5.5, 15.3]
<class 'float'>
<class 'float'>
<class 'float'>
[-5, 13]
<class 'int'>
<class 'int'>
[0j, 3j, 0j, 'aa', 'four', 'one', 'zz', -5.5, 2.3, 15.3, -5, 13]

答案 4 :(得分:1)

Python 3.2+的一种方法是使用functools.cmp_to_key()。 有了这个,您可以快速实现一个尝试比较值的解决方案,然后再回过头来比较类型的字符串表示。您还可以避免在比较无序类型时产生错误,并保留原始情况下的顺序:

from functools import cmp_to_key

def cmp(a,b):
    try:
        return (a > b) - (a < b)
    except TypeError:
        s1, s2 = type(a).__name__, type(b).__name__
        return (s1 > s2) - (s1 < s2)

示例(取自Martijn Pieters's answer的输入列表):

sorted([0, 'one', 2.3, 'four', -5], key=cmp_to_key(cmp))
# [-5, 0, 2.3, 'four', 'one']
sorted([0, 123.4, 5, -6, 7.89], key=cmp_to_key(cmp))
# [-6, 0, 5, 7.89, 123.4]
sorted([{1:2}, {3:4}], key=cmp_to_key(cmp))
# [{1: 2}, {3: 4}]
sorted([{1:2}, None, {3:4}], key=cmp_to_key(cmp))
# [None, {1: 2}, {3: 4}]

这样做的缺点是始终进行三向比较,增加了时间复杂度。但是,解决方案是低开销,简短,干净,我认为cmp_to_key()是为这种Python 2仿真用例开发的。

答案 5 :(得分:1)

我们可以通过以下方式解决这个问题。

  1. 按类别分组。
  2. 通过尝试比较每种类型的单个代表来查找哪些类型具有可比性。
  3. 合并可比类型的群组。
  4. 如果可能,请对合并的组进行排序。
  5. 来自(已排序)合并组的收益
  6. 我们可以使用repr(type(x))从类型中获取确定性和可订购的关键函数。请注意,此处的“类型层次结构”由类型本身的repr确定。这种方法的一个缺陷是,如果两种类型具有相同的__repr__(类型本身,而不是实例),则会“混淆”类型。这可以通过使用返回元组(repr(type), id(type))的键函数来解决,但我没有在此解决方案中实现它。

    我的方法优于Bas Swinkel的优点是更清晰地处理一组不可订购的元素。我们没有二次行为;相反,该函数在sorted())中第一次尝试排序后放弃。

    我的方法在迭代中存在极大数量的不同类型的情况下功能最差。这是一种罕见的情况,但我想它可能会出现。

    def py2sort(iterable):
            by_type_repr = lambda x: repr(type(x))
            iterable = sorted(iterable, key = by_type_repr)
            types = {type_: list(group) for type_, group in groupby(iterable, by_type_repr)}
    
            def merge_compatible_types(types):
                representatives = [(type_, items[0]) for (type_, items) in types.items()]
    
                def mergable_types():
                    for i, (type_0, elem_0) in enumerate(representatives, 1):
                        for type_1, elem_1 in representatives[i:]:
                             if _comparable(elem_0, elem_1):
                                 yield type_0, type_1
    
                def merge_types(a, b):
                    try:
                        types[a].extend(types[b])
                        del types[b]
                    except KeyError:
                        pass # already merged
    
                for a, b in mergable_types():
                    merge_types(a, b)
                return types
    
            def gen_from_sorted_comparable_groups(types):
                for _, items in types.items():
                    try:
                        items = sorted(items)
                    except TypeError:
                        pass #unorderable type
                    yield from items
            types = merge_compatible_types(types)
            return list(gen_from_sorted_comparable_groups(types))
    
        def _comparable(x, y):
            try:
                x < y
            except TypeError:
                return False
            else:
                return True
    
        if __name__ == '__main__':    
            print('before py2sort:')
            test = [2, -11.6, 3, 5.0, (1, '5', 3), (object, object()), complex(2, 3), [list, tuple], Fraction(11, 2), '2', type, str, 'foo', object(), 'bar']    
            print(test)
            print('after py2sort:')
            print(py2sort(test))
    

答案 6 :(得分:1)

我想建议启动这类任务(比如模仿另一个系统的行为,非常接近这个),并详细说明目标系统。它应该如何适用于不同的角落案例。最好的方法之一 - 编写一堆测试以确保正确的行为。进行这样的测试给出了:

  • 更好地了解哪些元素应该在哪个
  • 之前
  • 基本文件
  • 使系统在某些重构和添加功能方面具有强大的功能。例如,如果再添​​加一条规则 - 如何确保之前的规则不会被破坏?

可以编写这样的测试用例:

<强> sort2_test.py

import unittest
from sort2 import sorted2


class TestSortNumbers(unittest.TestCase):
    """
    Verifies numbers are get sorted correctly.
    """

    def test_sort_empty(self):
        self.assertEqual(sorted2([]), [])

    def test_sort_one_element_int(self):
        self.assertEqual(sorted2([1]), [1])

    def test_sort_one_element_real(self):
        self.assertEqual(sorted2([1.0]), [1.0])

    def test_ints(self):
        self.assertEqual(sorted2([1, 2]), [1, 2])

    def test_ints_reverse(self):
        self.assertEqual(sorted2([2, 1]), [1, 2])


class TestSortStrings(unittest.TestCase):
    """
    Verifies numbers are get sorted correctly.
    """

    def test_sort_one_element_str(self):
        self.assertEqual(sorted2(["1.0"]), ["1.0"])


class TestSortIntString(unittest.TestCase):
    """
    Verifies numbers and strings are get sorted correctly.
    """

    def test_string_after_int(self):
        self.assertEqual(sorted2([1, "1"]), [1, "1"])
        self.assertEqual(sorted2([0, "1"]), [0, "1"])
        self.assertEqual(sorted2([-1, "1"]), [-1, "1"])
        self.assertEqual(sorted2(["1", 1]), [1, "1"])
        self.assertEqual(sorted2(["0", 1]), [1, "0"])
        self.assertEqual(sorted2(["-1", 1]), [1, "-1"])


class TestSortIntDict(unittest.TestCase):
    """
    Verifies numbers and dict are get sorted correctly.
    """

    def test_string_after_int(self):
        self.assertEqual(sorted2([1, {1: 2}]), [1, {1: 2}])
        self.assertEqual(sorted2([0, {1: 2}]), [0, {1: 2}])
        self.assertEqual(sorted2([-1, {1: 2}]), [-1, {1: 2}])
        self.assertEqual(sorted2([{1: 2}, 1]), [1, {1: 2}])
        self.assertEqual(sorted2([{1: 2}, 1]), [1, {1: 2}])
        self.assertEqual(sorted2([{1: 2}, 1]), [1, {1: 2}])

下一个可能有这样的排序功能:

<强> sort2.py

from numbers import Real
from decimal import Decimal
from itertools import tee, filterfalse


def sorted2(iterable):
    """

    :param iterable: An iterable (array or alike)
        entity which elements should be sorted.
    :return: List with sorted elements.
    """
    def predicate(x):
        return isinstance(x, (Real, Decimal))

    t1, t2 = tee(iterable)
    numbers = filter(predicate, t1)
    non_numbers = filterfalse(predicate, t2)
    sorted_numbers = sorted(numbers)
    sorted_non_numbers = sorted(non_numbers, key=str)
    return sorted_numbers + sorted_non_numbers

用法非常简单,并在测试中记录:

>>> from sort2 import sorted2
>>> sorted2([1,2,3, "aaa", {3:5}, [1,2,34], {-8:15}])
[1, 2, 3, [1, 2, 34], 'aaa', {-8: 15}, {3: 5}]

答案 7 :(得分:1)

我试图尽可能忠实地在python 3中实现Python 2排序c代码。

使用它:numpy.stackmydata.sort(key=py2key())

mydata.sort(key=py2key(lambda x: mykeyfunc))

答案 8 :(得分:0)

以下是实现此目的的一种方法:

lst = [0, 'one', 2.3, 'four', -5]
a=[x for x in lst if type(x) == type(1) or type(x) == type(1.1)] 
b=[y for y in lst if type(y) == type('string')]
a.sort()
b.sort()
c = a+b
print(c)

答案 9 :(得分:0)

@ martijn-pieters我不知道python2中的list是否也有__cmp__来处理比较列表对象或者如何在python2中处理它。

无论如何,除了@martijn-pieters's answer之外,我使用了下面的列表比较器,所以至少它没有根据同一输入集中不同的元素顺序给出不同的排序输出。

@per_type_cmp(list) def list_cmp(a, b): for a_item, b_item in zip(a, b): if a_item == b_item: continue return python2_sort_key(a_item) < python2_sort_key(b_item) return len(a) < len(b)

所以,加上Martijn的原始答案:

from numbers import Number


# decorator for type to function mapping special cases
def per_type_cmp(type_):
    try:
        mapping = per_type_cmp.mapping
    except AttributeError:
        mapping = per_type_cmp.mapping = {}
    def decorator(cmpfunc):
        mapping[type_] = cmpfunc
        return cmpfunc
    return decorator


class python2_sort_key(object):
    _unhandled_types = {complex}

    def __init__(self, ob):
       self._ob = ob

    def __lt__(self, other):
        _unhandled_types = self._unhandled_types
        self, other = self._ob, other._ob  # we don't care about the wrapper

        # default_3way_compare is used only if direct comparison failed
        try:
            return self < other
        except TypeError:
            pass

        # hooks to implement special casing for types, dict in Py2 has
        # a dedicated __cmp__ method that is gone in Py3 for example.
        for type_, special_cmp in per_type_cmp.mapping.items():
            if isinstance(self, type_) and isinstance(other, type_):
                return special_cmp(self, other)

        # explicitly raise again for types that won't sort in Python 2 either
        if type(self) in _unhandled_types:
            raise TypeError('no ordering relation is defined for {}'.format(
                type(self).__name__))
        if type(other) in _unhandled_types:
            raise TypeError('no ordering relation is defined for {}'.format(
                type(other).__name__))

        # default_3way_compare from Python 2 as Python code
        # same type but no ordering defined, go by id
        if type(self) is type(other):
            return id(self) < id(other)

        # None always comes first
        if self is None:
            return True
        if other is None:
            return False

        # Sort by typename, but numbers are sorted before other types
        self_tname = '' if isinstance(self, Number) else type(self).__name__
        other_tname = '' if isinstance(other, Number) else type(other).__name__

        if self_tname != other_tname:
            return self_tname < other_tname

        # same typename, or both numbers, but different type objects, order
        # by the id of the type object
        return id(type(self)) < id(type(other))


@per_type_cmp(dict)
def dict_cmp(a, b, _s=object()):
    if len(a) != len(b):
        return len(a) < len(b)
    adiff = min((k for k in a if a[k] != b.get(k, _s)), key=python2_sort_key, default=_s)
    if adiff is _s:
        # All keys in a have a matching value in b, so the dicts are equal
        return False
    bdiff = min((k for k in b if b[k] != a.get(k, _s)), key=python2_sort_key)
    if adiff != bdiff:
        return python2_sort_key(adiff) < python2_sort_key(bdiff)
    return python2_sort_key(a[adiff]) < python2_sort_key(b[bdiff])

@per_type_cmp(list)
def list_cmp(a, b):
    for a_item, b_item in zip(a, b):
        if a_item == b_item:
            continue
        return python2_sort_key(a_item) < python2_sort_key(b_item)
    return len(a) < len(b)

PS:将它创建为评论更有意义,但我没有足够的声誉来发表评论。所以,我将其创建为答案。