python heapq排序列表错了吗?

时间:2016-09-24 10:00:59

标签: python sorting

我正在尝试将列表排序到一个列表中,该列表包含节,子节和子子节的数字和名称。该计划如下:

import heapq

sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section']
subsections = ['1.1 Subsection', '1.2 Subsection', '1.3 Subsection', '1.4 Subsection', '2.1 Subsection', '4.1 My subsection', '7.1 Subsection', '8.1 Subsection', '12.1 Subsection']
subsubsections = ['1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.4.1 Subsubsection', '2.1.1 Subsubsection', '7.1.1 Subsubsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection']

sorted_list = list(heapq.merge(sections, subsections, subsubsections))

print(sorted_list)

我得到的是:

['1. Section', '1.1 Subsection', '1.2 Subsection', '1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.3 Subsection', '1.4 Subsection', '1.4.1 Subsubsection', '2. Section', '2.1 Subsection', '2.1.1 Subsubsection', '3. Section', '4. Section', '4.1 My subsection', '5. Section', '6. Section', '7. Section', '7.1 Subsection', '7.1.1 Subsubsection', '8. Section', '8.1 Subsection', '12.1 Subsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection', '9. Section', '10. Section', '11. Section', '12. Section']

我的第12小节,子小节位于第8节,而不是第12节。

为什么会这样?原始列表已经过排序,一切顺利,显然达到了10个。

我不确定为什么会发生这种情况,并且有办法更好地将其分类为“树”。基于列表中的数字?我正在构建一个各种各样的目录,这将返回(一旦我将列表过滤掉)

1. Section
    1.1 Subsection
    1.2 Subsection
        1.2.1 Subsubsection
        1.2.2 Subsubsection
    1.3 Subsection
    1.4 Subsection
        1.4.1 Subsubsection
2. Section
    2.1 Subsection
        2.1.1 Subsubsection
3. Section
4. Section
    4.1 My subsection
5. Section
6. Section
7. Section
    7.1 Subsection
        7.1.1 Subsubsection
8. Section
    8.1 Subsection
    12.1 Subsection
        8.1.1 Subsubsection
        12.1.1 Subsubsection
9. Section
10. Section
11. Section
12. Section

注意8.1小节背后的12.1小节和8.1.1小节后的12.1.1小节。

2 个答案:

答案 0 :(得分:4)

您的列表可能会显示对人眼进行排序。但是对于Python,你的输入没有完全排序,因为它按字典顺序排序字符串 。这意味着'8'按排序顺序排在'12.1'之前,因为只比较了第一个字符

因此,合并是完全正确的;在看到'8.1'字符串后遇到以'8.1.1'开头的字符串,但以section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d] heapq.merge(sections, subsections, subsubsections, key=section)) 开头的字符串随后会被排序。

您必须使用键功能从字符串中提取整数元组才能正确排序:

key

请注意,>>> section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d] >>> sorted_list = list(heapq.merge(sections, subsections, subsubsections, key=section)) >>> from pprint import pprint >>> pprint(sorted_list) ['1. Section', '1.1 Subsection', '1.2 Subsection', '1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.3 Subsection', '1.4 Subsection', '1.4.1 Subsubsection', '2. Section', '2.1 Subsection', '2.1.1 Subsubsection', '3. Section', '4. Section', '4.1 My subsection', '5. Section', '6. Section', '7. Section', '7.1 Subsection', '7.1.1 Subsubsection', '8. Section', '8.1 Subsection', '8.1.1 Subsubsection', '9. Section', '10. Section', '11. Section', '12. Section', '12.1 Subsection', '12.1.1 Subsubsection'] 参数仅适用于Python 3.5及更高版本;你必须在早期版本中进行手动装饰 - 合并 - 不合理的舞蹈。

演示(使用Python 3.6):

import heapq

def _heappop_max(heap):
    lastelt = heap.pop()
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        heapq._siftup_max(heap, 0)
        return returnitem
    return lastelt

def _heapreplace_max(heap, item):
    returnitem = heap[0]
    heap[0] = item
    heapq._siftup_max(heap, 0)
    return returnitem

def merge(*iterables, key=None, reverse=False):    
    h = []
    h_append = h.append

    if reverse:
        _heapify = heapq._heapify_max
        _heappop = _heappop_max
        _heapreplace = _heapreplace_max
        direction = -1
    else:
        _heapify = heapify
        _heappop = heappop
        _heapreplace = heapreplace
        direction = 1

    if key is None:
        for order, it in enumerate(map(iter, iterables)):
            try:
                next = it.__next__
                h_append([next(), order * direction, next])
            except StopIteration:
                pass
        _heapify(h)
        while len(h) > 1:
            try:
                while True:
                    value, order, next = s = h[0]
                    yield value
                    s[0] = next()           # raises StopIteration when exhausted
                    _heapreplace(h, s)      # restore heap condition
            except StopIteration:
                _heappop(h)                 # remove empty iterator
        if h:
            # fast case when only a single iterator remains
            value, order, next = h[0]
            yield value
            yield from next.__self__
        return

    for order, it in enumerate(map(iter, iterables)):
        try:
            next = it.__next__
            value = next()
            h_append([key(value), order * direction, value, next])
        except StopIteration:
            pass
    _heapify(h)
    while len(h) > 1:
        try:
            while True:
                key_value, order, value, next = s = h[0]
                yield value
                value = next()
                s[0] = key(value)
                s[2] = value
                _heapreplace(h, s)
        except StopIteration:
            _heappop(h)
    if h:
        key_value, order, value, next = h[0]
        yield value
        yield from next.__self__

键控合并很容易向后移植到Python 3.3和3.4:

def decorate(iterable, key):
    for elem in iterable:
        yield key(elem), elem

sorted = [v for k, v in heapq.merge(
    decorate(sections, section), decorate(subsections, section)
    decorate(subsubsections, section))]

decorate-sort-undecorate merge简单如下:

sorted()

由于您的输入已经排序,因此使用合并排序更有效。作为最后的手段,您可以使用from itertools import chain result = sorted(chain(sections, subsections, subsubsections), key=section) 但是:

#include <iostream>

class Object 
{
public:
    virtual ~Object(void) {};
    int compare(Object const& obj) const;
    virtual bool operator==(Object const& integer) const = 0;
    virtual bool operator<(Object const& integer) const = 0;
    virtual bool operator>(Object const& integer) const = 0;
};

int Object::compare(Object const& obj) const
{
    if(*this == obj)
        return 0;
    else if(*this < obj)
        return -1;
    else return 1;
}

class Integer: public Object
{
private:
    int myInt;
public:
    Integer(int i) : myInt(i) { };
    virtual bool operator==(Object const& integer) const override;
    virtual bool operator<(Object const& integer) const override;
    virtual bool operator>(Object const& integer) const override;
};

bool Integer::operator==(Object const& integer) const
{
    return myInt == dynamic_cast<Integer const&>(integer).myInt;
}

bool Integer::operator<(Object const& integer) const
{
    return myInt < dynamic_cast<Integer const&>(integer).myInt;
}

bool Integer::operator>(Object const& integer) const
{
    return myInt > dynamic_cast<Integer const&>(integer).myInt;
}
int main()
{
    Integer a(2), b(2), c(3);
    std::cout << a.compare(b) << std::endl;
    std::cout << b.compare(c) << std::endl;
    std::cout << c.compare(a) << std::endl;
}

答案 1 :(得分:4)

正如在其他答案中所解释的那样,您必须指定一个排序方法,否则python将按字典顺序对字符串进行排序。如果您使用的是python 3.5+,可以在key函数中使用merge参数,在python 3.5中 - 您可以使用itertools.chainsorted,作为一般方法,您可以使用使用正则表达式来查找数字并将它们转换为int:

In [18]: from itertools import chain
In [19]: import re
In [23]: sorted(chain.from_iterable((sections, subsections, subsubsections)),
                key = lambda x: [int(i) for i in re.findall(r'\d+', x)])
Out[23]: 
['1. Section',
 '1.1 Subsection',
 '1.2 Subsection',
 '1.2.1 Subsubsection',
 '1.2.2 Subsubsection',
 '1.3 Subsection',
 '1.4 Subsection',
 '1.4.1 Subsubsection',
 '2. Section',
 '2.1 Subsection',
 '2.1.1 Subsubsection',
 '3. Section',
 '4. Section',
 '4.1 My subsection',
 '5. Section',
 '6. Section',
 '7. Section',
 '7.1 Subsection',
 '7.1.1 Subsubsection',
 '8. Section',
 '8.1 Subsection',
 '8.1.1 Subsubsection',
 '9. Section',
 '10. Section',
 '11. Section',
 '12. Section',
 '12.1 Subsection',
 '12.1.1 Subsubsection']