Python是否具有用于字符串自然排序的内置函数?

时间:2011-01-29 11:55:53

标签: python sorting python-3.x

使用Python 3.x,我有一个字符串列表,我想对其执行自然的字母排序。

自然排序: Windows中文件的排序顺序。

例如,以下列表是自然排序的(我想要的):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

这是上面列表的“排序”版本(我拥有的):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

我正在寻找一种与第一种行为相似的排序功能。

18 个答案:

答案 0 :(得分:183)

在PyPI上有一个名为natsort的第三方库(完全披露,我是包的作者)。对于您的情况,您可以执行以下任一操作:

>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE)  # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

你应该注意natsort使用一般算法,所以它应该适用于你抛出的任何输入。如果您想了解更多有关为什么选择库来执行此操作而不是滚动自己的功能的详细信息,请查看natsort文档的How It Works页面,尤其是Special Cases Everywhere!部分。


如果您需要排序键而不是排序功能,请使用以下任一公式。

>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

答案 1 :(得分:161)

试试这个:

import re

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

输出:

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

看到它在线工作:ideone

从此处改编的代码:Sorting for Humans : Natural Sort Order

答案 2 :(得分:83)

这是Mark Byer回答的更为pythonic的版本:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]    

现在,此函数可用作任何使用它的函数中的键,如list.sortsortedmax等。

作为一个lambda:

lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]

答案 3 :(得分:19)

我写了一个基于http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html的函数,它增加了传递你自己的'key'参数的能力。我需要这个来执行包含更复杂对象(不仅仅是字符串)的自然类型的列表。

import re

def natural_sort(list, key=lambda s:s):
    """
    Sort the list into natural alphanumeric order.
    """
    def get_alphanum_key_func(key):
        convert = lambda text: int(text) if text.isdigit() else text 
        return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
    sort_key = get_alphanum_key_func(key)
    list.sort(key=sort_key)

例如:

my_list = [{'name':'b'}, {'name':'10'}, {'name':'a'}, {'name':'1'}, {'name':'9'}]
natural_sort(my_list, key=lambda x: x['name'])
print my_list
[{'name': '1'}, {'name': '9'}, {'name': '10'}, {'name': 'a'}, {'name': 'b'}]

答案 4 :(得分:10)

假设:

data=['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

与SergO的解决方案类似,没有外部库的 1-liner

data.sort(key=lambda x : int(x[3:]))

sorted_data=sorted(data, key=lambda x : int(x[3:]))

说明:

此解决方案使用 sort key 功能来定义将用于排序的函数。因为我们知道每个数据条目前面都有' elm'排序函数将第三个字符后面的字符串部分(即int(x [3:]))转换为整数。如果数据的数字部分位于不同的位置,则该部分函数必须改变。

干杯

答案 5 :(得分:5)

现在更优雅(pythonic) - 只需触摸

有许多实现,虽然有些已经接近,但没有一个完全捕捉到现代蟒蛇提供的优雅。

  • 使用python(3.5.1)
  • 测试
  • 包含一个额外的列表,以证明它可以在 数字是中间字符串
  • 没有测试,但是,我假设如果你的列表相当大,那么预先编译正则表达式会更有效率
    • 如果这是一个错误的假设,我相信有人会纠正我

Quicky
from re import compile, split    
dre = compile(r'(\d+)')
mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
全代码
#!/usr/bin/python3
# coding=utf-8
"""
Natural-Sort Test
"""

from re import compile, split

dre = compile(r'(\d+)')
mylist = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13', 'elm']
mylist2 = ['e0lm', 'e1lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm', 'e01lm']

mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
mylist2.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

print(mylist)  
  # ['elm', 'elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(mylist2)  
  # ['e0lm', 'e1lm', 'e01lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm']
使用

注意

  • from os.path import split
    • 您需要区分导入
来自

灵感

答案 6 :(得分:4)

一种选择是将字符串转换为元组并使用展开形式http://wiki.answers.com/Q/What_does_expanded_form_mean替换数字

那样a90会变成(“a”,90,0)而a1会变成(“a”,1)

下面是一些示例代码(由于它从数字中删除前导0的方式效率不高)

alist=["something1",
    "something12",
    "something17",
    "something2",
    "something25and_then_33",
    "something25and_then_34",
    "something29",
    "beta1.1",
    "beta2.3.0",
    "beta2.33.1",
    "a001",
    "a2",
    "z002",
    "z1"]

def key(k):
    nums=set(list("0123456789"))
        chars=set(list(k))
    chars=chars-nums
    for i in range(len(k)):
        for c in chars:
            k=k.replace(c+"0",c)
    l=list(k)
    base=10
    j=0
    for i in range(len(l)-1,-1,-1):
        try:
            l[i]=int(l[i])*base**j
            j+=1
        except:
            j=0
    l=tuple(l)
    print l
    return l

print sorted(alist,key=key)

输出:

('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 1)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 7)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 3)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 4)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 9)
('b', 'e', 't', 'a', 1, '.', 1)
('b', 'e', 't', 'a', 2, '.', 3, '.')
('b', 'e', 't', 'a', 2, '.', 30, 3, '.', 1)
('a', 1)
('a', 2)
('z', 2)
('z', 1)
['a001', 'a2', 'beta1.1', 'beta2.3.0', 'beta2.33.1', 'something1', 'something2', 'something12', 'something17', 'something25and_then_33', 'something25and_then_34', 'something29', 'z1', 'z002']

答案 7 :(得分:3)

根据这里的答案,我写了一个natural_sorted函数,其行为类似于内置函数sorted

# Copyright (C) 2018, Benjamin Drung <bdrung@posteo.de>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import re

def natural_sorted(iterable, key=None, reverse=False):
    """Return a new naturally sorted list from the items in *iterable*.

    The returned list is in natural sort order. The string is ordered
    lexicographically (using the Unicode code point number to order individual
    characters), except that multi-digit numbers are ordered as a single
    character.

    Has two optional arguments which must be specified as keyword arguments.

    *key* specifies a function of one argument that is used to extract a
    comparison key from each list element: ``key=str.lower``.  The default value
    is ``None`` (compare the elements directly).

    *reverse* is a boolean value.  If set to ``True``, then the list elements are
    sorted as if each comparison were reversed.

    The :func:`natural_sorted` function is guaranteed to be stable. A sort is
    stable if it guarantees not to change the relative order of elements that
    compare equal --- this is helpful for sorting in multiple passes (for
    example, sort by department, then by salary grade).
    """
    prog = re.compile(r"(\d+)")

    def alphanum_key(element):
        """Split given key in list of strings and digits"""
        return [int(c) if c.isdigit() else c for c in prog.split(key(element)
                if key else element)]

    return sorted(iterable, key=alphanum_key, reverse=reverse)

我的GitHub代码段存储库中也提供了源代码: https://github.com/bdrung/snippets/blob/master/natural_sorted.py

答案 8 :(得分:3)

此帖的价值

我的观点是提供一般可以应用的非正则表达式解决方案 我将创建三个函数:

    我从@AnuragUniyal借来的
  1. find_first_digit。它会在字符串中找到第一个数字或非数字的位置。
  2. split_digits这是一个将字符串分成数字和非数字块的生成器。当它是一个数字时,它也会yield整数。
  3. natural_key只需将split_digits包裹到tuple中。这是我们用作sortedmaxmin的关键字。
  4. 功能

    def find_first_digit(s, non=False):
        for i, x in enumerate(s):
            if x.isdigit() ^ non:
                return i
        return -1
    
    def split_digits(s, case=False):
        non = True
        while s:
            i = find_first_digit(s, non)
            if i == 0:
                non = not non
            elif i == -1:
                yield int(s) if s.isdigit() else s if case else s.lower()
                s = ''
            else:
                x, s = s[:i], s[i:]
                yield int(x) if x.isdigit() else x if case else x.lower()
    
    def natural_key(s, *args, **kwargs):
        return tuple(split_digits(s, *args, **kwargs))
    

    我们可以看到它是通用的,因为我们可以有多个数字块:

    # Note that the key has lower case letters
    natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh')
    
    ('asl;dkfdfkj:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')
    

    或者保持区分大小写:

    natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh', True)
    
    ('asl;dkfDFKJ:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')
    

    我们可以看到它按照适当的顺序对OP列表进行排序

    sorted(
        ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13'],
        key=natural_key
    )
    
    ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
    

    但它也可以处理更复杂的列表:

    sorted(
        ['f_1', 'e_1', 'a_2', 'g_0', 'd_0_12:2', 'd_0_1_:2'],
        key=natural_key
    )
    
    ['a_2', 'd_0_1_:2', 'd_0_12:2', 'e_1', 'f_1', 'g_0']
    

    我的正则表达式相当于

    def int_maybe(x):
        return int(x) if str(x).isdigit() else x
    
    def split_digits_re(s, case=False):
        parts = re.findall('\d+|\D+', s)
        if not case:
            return map(int_maybe, (x.lower() for x in parts))
        else:
            return map(int_maybe, parts)
    
    def natural_key_re(s, *args, **kwargs):
        return tuple(split_digits_re(s, *args, **kwargs))
    

答案 9 :(得分:2)

以上答案适用于所显示的具体示例,但遗漏了一些有关自然排序的更一般问题的有用案例。我刚刚得到了其中一个案例,因此创建了一个更彻底的解决方案:

def natural_sort_key(string_or_number):
    """
    by Scott S. Lawton <scott@ProductArchitect.com> 2014-12-11; public domain and/or CC0 license

    handles cases where simple 'int' approach fails, e.g.
        ['0.501', '0.55'] floating point with different number of significant digits
        [0.01, 0.1, 1]    already numeric so regex and other string functions won't work (and aren't required)
        ['elm1', 'Elm2']  ASCII vs. letters (not case sensitive)
    """

    def try_float(astring):
        try:
            return float(astring)
        except:
            return astring

    if isinstance(string_or_number, basestring):
        string_or_number = string_or_number.lower()

        if len(re.findall('[.]\d', string_or_number)) <= 1:
            # assume a floating point value, e.g. to correctly sort ['0.501', '0.55']
            # '.' for decimal is locale-specific, e.g. correct for the Anglosphere and Asia but not continental Europe
            return [try_float(s) for s in re.split(r'([\d.]+)', string_or_number)]
        else:
            # assume distinct fields, e.g. IP address, phone number with '.', etc.
            # caveat: might want to first split by whitespace
            # TBD: for unicode, replace isdigit with isdecimal
            return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_or_number)]
    else:
        # consider: add code to recurse for lists/tuples and perhaps other iterables
        return string_or_number

测试代码和几个链接(StackOverflow的开启和关闭)在这里: http://productarchitect.com/code/better-natural-sort.py

欢迎反馈。这并不意味着是一个明确的解决方案;向前迈出一步。

答案 10 :(得分:2)

最有可能functools.cmp_to_key()与python排序的底层实现密切相关。此外, cmp 参数是遗留的。现代方法是将输入项转换为支持所需丰富比较操作的对象。

在CPython 2.x下,即使各个丰富的比较运营商尚未实施,也可以订购不同类型的对象。在CPython 3.x下,不同类型的对象必须明确支持比较。请参阅指向How does Python compare string and int?official documentation。大多数答案取决于这种隐式排序。切换到Python 3.x将需要一种新类型来实现和统一数字和字符串之间的比较。

Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
>>> (0,"foo") < ("foo",0)
True  
Python 3.5.2 (default, Oct 14 2016, 12:54:53) 
>>> (0,"foo") < ("foo",0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  TypeError: unorderable types: int() < str()

有三种不同的方法。第一个使用嵌套类来利用Python的Iterable比较算法。第二个将这个嵌套展开为一个类。第三个放弃了str的子类,专注于绩效。一切都是定时的;第二个快了两倍,而第三个快了近六倍。不需要对str进行子类化,这可能是一个坏主意,但它确实带来了一定的便利性。

重复排序字符以强制按案例排序,并且大小写交换以强制小写字母排序第一;这是&#34;自然排序&#34;的典型定义。我无法决定分组的类型;有些人可能更喜欢以下内容,这也带来了显着的性能优势:

d = lambda s: s.lower()+s.swapcase()

在使用时,比较运算符设置为object,因此他们不会be ignored by functools.total_ordering

import functools
import itertools


@functools.total_ordering
class NaturalStringA(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda c, s: [ c.NaturalStringPart("".join(v))
                        for k,v in
                       itertools.groupby(s, c.isdigit)
                     ]
    d = classmethod(d)
    @functools.total_ordering
    class NaturalStringPart(str):
        d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
        d = staticmethod(d)
        def __lt__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) < int(other)
            except ValueError:
                if self.isdigit():
                    return True
                elif other.isdigit():
                    return False
                else:
                    return self.d(self) < self.d(other)
        def __eq__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) == int(other)
            except ValueError:
                if self.isdigit() or other.isdigit():
                    return False
                else:
                    return self.d(self) == self.d(other)
        __le__ = object.__le__
        __ne__ = object.__ne__
        __gt__ = object.__gt__
        __ge__ = object.__ge__
    def __lt__(self, other):
        return self.d(self) < self.d(other)
    def __eq__(self, other):
        return self.d(self) == self.d(other)
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools


@functools.total_ordering
class NaturalStringB(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
    d = staticmethod(d)
    def __lt__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
        return False
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None or o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return False
        return True
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools
import enum


class OrderingType(enum.Enum):
    PerWordSwapCase         = lambda s: s.lower()+s.swapcase()
    PerCharacterSwapCase    = lambda s: "".join(c.lower()+c.swapcase() for c in s)


class NaturalOrdering:
    @classmethod
    def by(cls, ordering):
        def wrapper(string):
            return cls(string, ordering)
        return wrapper
    def __init__(self, string, ordering=OrderingType.PerCharacterSwapCase):
        self.string = string
        self.groups = [ (k,int("".join(v)))
                            if k else
                        (k,ordering("".join(v)))
                            for k,v in
                        itertools.groupby(string, str.isdigit)
                      ]
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , self.string
            )
    def __lesser(self, other, default):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return s_v < o_v
        return default
    def __lt__(self, other):
        return self.__lesser(other, default=False)
    def __le__(self, other):
        return self.__lesser(other, default=True)
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None or o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return False
        return True
    # functools.total_ordering doesn't create single-call wrappers if both
    # __le__ and __lt__ exist, so do it manually.
    def __gt__(self, other):
        op_result = self.__le__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    def __ge__(self, other):
        op_result = self.__lt__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    # __ne__ is the only implied ordering relationship, it automatically
    # delegates to __eq__
>>> import natsort
>>> import timeit
>>> l1 = ['Apple', 'corn', 'apPlE', 'arbour', 'Corn', 'Banana', 'apple', 'banana']
>>> l2 = list(map(str, range(30)))
>>> l3 = ["{} {}".format(x,y) for x in l1 for y in l2]
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringA)', number=10000, globals=globals()))
362.4729259099986
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringB)', number=10000, globals=globals()))
189.7340817489967
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalOrdering.by(OrderingType.PerCharacterSwapCase))', number=10000, globals=globals()))
69.34636392899847
>>> print(timeit.timeit('natsort.natsorted(l3+["0"], alg=natsort.ns.GROUPLETTERS | natsort.ns.LOWERCASEFIRST)', number=10000, globals=globals()))
98.2531585780016

自然分类既复杂又模糊,是一个问题。不要忘记事先运行unicodedata.normalize(...),并考虑使用str.casefold()而不是str.lower()。我可能还没有考虑过微妙的编码问题。所以我暂时推荐natsort库。我快速浏览了一下github存储库;代码维护一直很好。

我见过的所有算法都依赖于复制和降低字符以及交换案例等技巧。虽然这使运行时间加倍,但另一种方法是在输入字符集上需要完全自然的排序。我不认为这是unicode规范的一部分,因为有比[0-9]更多的unicode数字,所以创建这样的排序同样令人生畏。如果您想要进行区域设置感知比较,请按照Python locale.strxfrmSorting HOW TO准备字符串。

答案 11 :(得分:2)

Claudiu对Mark Byers回答的改进;-)

import re

def natural_sort_key(s, _re=re.compile(r'(\d+)')):
    return [int(t) if i & 1 else t.lower() for i, t in enumerate(_re.split(s))]

...
my_naturally_sorted_list = sorted(my_list, key=natural_sort_key)

顺便说一句,也许不是每个人都记得函数参数默认值是在def时计算的。

答案 12 :(得分:1)

让我对这种需要提出自己的看法:

from typing import Tuple, Union, Optional, Generator


StrOrInt = Union[str, int]


# On Python 3.6, string concatenation is REALLY fast
# Tested myself, and this fella also tested:
# https://blog.ganssle.io/articles/2019/11/string-concat.html
def griter(s: str) -> Generator[StrOrInt, None, None]:
    last_was_digit: Optional[bool] = None
    cluster: str = ""
    for c in s:
        if last_was_digit is None:
            last_was_digit = c.isdigit()
            cluster += c
            continue
        if c.isdigit() != last_was_digit:
            if last_was_digit:
                yield int(cluster)
            else:
                yield cluster
            last_was_digit = c.isdigit()
            cluster = ""
        cluster += c
    if last_was_digit:
        yield int(cluster)
    else:
        yield cluster
    return


def grouper(s: str) -> Tuple[StrOrInt, ...]:
    return tuple(griter(s))

现在,如果我们有这样的列表:

filelist = [
    'File3', 'File007', 'File3a', 'File10', 'File11', 'File1', 'File4', 'File5',
    'File9', 'File8', 'File8b1', 'File8b2', 'File8b11', 'File6'
]

我们可以简单地使用key= kwarg进行自然排序:

>>> sorted(filelist, key=grouper)
['File1', 'File3', 'File3a', 'File4', 'File5', 'File6', 'File007', 'File8', 
'File8b1', 'File8b2', 'File8b11', 'File9', 'File10', 'File11']

这里的缺点当然是,与现在一样,该函数将大写字母排在小写字母之前。

我将不区分大小写的石斑鱼的实现留给读者:-)

答案 13 :(得分:0)

我使用的算法是padzero_with_lower,定义如下:

import re

def padzero_with_lower(s):
    return re.sub(r'\d+', lambda m: m.group(0).rjust(10, '0'), s).lower()

算法发现:

    查找并填充任意长度的数字,使其长度足够长,例如10
  • 然后,它将字符串转换为小写

这是一个用法示例:

print(padzero_with_lower('file1.txt'))   # file0000000001.txt
print(padzero_with_lower('file12.txt'))  # file0000000012.txt
print(padzero_with_lower('file23.txt'))  # file0000000023.txt
print(padzero_with_lower('file123.txt')) # file0000000123.txt
print(padzero_with_lower('file301.txt')) # file0000000301.txt
print(padzero_with_lower('Dir2/file15.txt'))  # dir0000000002/file0000000015.txt
print(padzero_with_lower('dir2/file123.txt')) # dir0000000002/file0000000123.txt
print(padzero_with_lower('dir15/file2.txt'))  # dir0000000015/file0000000002.txt
print(padzero_with_lower('Dir15/file15.txt')) # dir0000000015/file0000000015.txt
print(padzero_with_lower('elm0'))  # elm0000000000
print(padzero_with_lower('elm1'))  # elm0000000001
print(padzero_with_lower('Elm2'))  # elm0000000002
print(padzero_with_lower('elm9'))  # elm0000000009
print(padzero_with_lower('elm10')) # elm0000000010
print(padzero_with_lower('Elm11')) # elm0000000011 
print(padzero_with_lower('Elm12')) # elm0000000012
print(padzero_with_lower('elm13')) # elm0000000013

在测试了此功能之后,我们现在可以将其用作我们的密钥,即

lis = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
lis.sort(key=padzero_with_lower)
print(lis)
# Output: ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

答案 14 :(得分:0)

在@Mark Byers回答之后,这是一个接受key参数的改编,并且更符合PEP8。

def natsorted(seq, key=None):
    def convert(text):
        return int(text) if text.isdigit() else text

    def alphanum(obj):
        if key is not None:
            return [convert(c) for c in re.split(r'([0-9]+)', key(obj))]
        return [convert(c) for c in re.split(r'([0-9]+)', obj)]

    return sorted(seq, key=alphanum)

我也做了一个Gist

答案 15 :(得分:0)

a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

<强>致谢

Bubble Sort Homework

How to read a string one letter at a time in python

答案 16 :(得分:0)

我建议您只使用key的{​​{1}}关键字参数来获得所需的列表 例如:

sorted

答案 17 :(得分:-2)

>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'\d+$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']