识别排序列表中不同的连续倍数

时间:2014-02-05 17:15:26

标签: python

我有一个排序列表,想要在该列表中标识连续的多个数字。该列表可以包含不同顺序的连续倍数,这使得它更加困难。

一些测试用例:

[1,3,4,5] -> [[1], [3,4,5]]
[1,3,5,6,7] -> [[1], [3], [5,6,7]]
# consecutive multiples of 1 and 2 (or n)
[1,2,3,7,9,11] -> [[1,2,3], [7,9,11]
[1,2,3,7,10,12,14,25] -> [[1,2,3], [7], [10,12,14], [25]]
# overlapping consecutives !!!
[1,2,3,4,6,8,10] -> [[1,2,3,4], [6,8,10]

现在,我不知道我在做什么。我所做的是按照数字之间的距离成对分组,这是一个好的开始,但是我有很多问题需要确定每对中的哪个元素在哪里,即

 # initial list    
 [1,3,4,5]
 # pairs of same distance
 [[1,3], [[3,4], [4,5]]
 # algo to get the final result ?
 [[1], [3,4,5]]

非常感谢任何帮助。

编辑:也许提一下我想要的东西会更清楚。

我希望改变:

[1,5,10,11,12,13,14,15,17,20,22,24,26,28,30]

1, 5, 10 to 15 by 1, 17, 20 to 30 by 2

4 个答案:

答案 0 :(得分:1)

我会从差异列表开始。

length_a = len(list1)
diff_v  = [list1[j+1] - list1[j] for j in range(length_a-1)]

所以[1,2,3,7,11,13,15,17]成为[1,1,4,4,2,2,2]

现在很容易

答案 1 :(得分:1)

这是一个包含@ Bakuriu优化的版本:

MINIMAL_MATCH = 3

def find_some_sort_of_weird_consecutiveness(data):
    """
    >>> find_some_sort_of_weird_consecutiveness([1,3,4,5])
    [[1], [3, 4, 5]]
    >>> find_some_sort_of_weird_consecutiveness([1,3,5,6,7])
    [[1, 3, 5], [6], [7]]
    >>> find_some_sort_of_weird_consecutiveness([1,2,3,7,9,11])
    [[1, 2, 3], [7, 9, 11]]
    >>> find_some_sort_of_weird_consecutiveness([1,2,3,7,10,12,14,25])
    [[1, 2, 3], [7], [10, 12, 14], [25]]
    >>> find_some_sort_of_weird_consecutiveness([1,2,3,4,6,8,10])
    [[1, 2, 3, 4], [6, 8, 10]]
    >>> find_some_sort_of_weird_consecutiveness([1,5,10,11,12,13,14,15,17,20,22,24,26,28,30])
    [[1], [5], [10, 11, 12, 13, 14, 15], [17], [20, 22, 24, 26, 28, 30]]
    """
    def pair_iter(series):
        from itertools import tee
        _first, _next = tee(series)
        next(_next, None)
        for i, (f, n) in enumerate(zip(_first, _next), start=MINIMAL_MATCH - 1):
            yield i, f, n

    result = []
    while len(data) >= MINIMAL_MATCH:
        test = data[1] - data[0]
        if (data[2] - data[1]) == test:
            for i, f, n in pair_iter(data):
                if (n - f) != test:
                    i -= 1
                    break
        else:
            i = 1
        data, match = data[i:], data[:i]
        result.append(match)
    for d in data:
        result.append([d])
    return result

if __name__ == '__main__':
    from doctest import testmod
    testmod()

它处理您当前的所有测试用例。如果你有任何测试用例,请给我新的失败测试用例。

正如下面的评论所述,我假设最短的序列现在是三个元素,因为两个序列是微不足道的。

有关成对迭代器的解释,请参阅http://docs.python.org/2/library/itertools.html

答案 2 :(得分:0)

您可以随时跟踪上一个输出值:

in_ = [1, 2, 3, 4, 5]
out = [[in[0]]]
for item in in_[1:]:
    if out[-1][-1] != item - 1:
        out.append([])
    out[-1].append(item)

答案 3 :(得分:0)

我会根据索引和值之间的差异对列表进行分组:

from itertools import groupby
lst = [1,3,4,5]
result = []
for key, group in groupby(enumerate(lst), key = lambda (i, value): value - i):
    result.append([value for i, value in group])
print result
[[1], [3, 4, 5]]

我做了什么?

# at first I enumerate every item of list:
print list(enumerate(lst))
[(0, 1), (1, 3), (2, 4), (3, 5)]

# Then I subtract the index of each item from the item itself:
print [ value - i for i, value in enumerate(lst)]
[1, 2, 2, 2]

# As you see, consecutive numbers turn out to have the same difference between index and value
# We can use this feature and group the list by the difference of value minus index
print list( groupby(enumerate(lst), key = lambda (i, value): value - i) )
[(1, <itertools._grouper object at 0x104bff050>), (2, <itertools._grouper object at 0x104bff410>)]

# Now you can see how it works. Now I just want to add how to write this in one logical line:
result = [ [value for i, value in group]
    for key, group in groupby(enumerate(lst), key = lambda (i, value): value - i)]
print result
[[1], [3, 4, 5]]

识别n

的连续倍数的方法

我们来看看这个清单,

lst = [1,5,10,11,12,13,14,15,17,21,24,26,28,30]

尤其是相邻元素之间的差异以及三个连续元素的差异差异:

  1,   5,  10,  11,  12,  13,  14,  15,  17,  21,  24,  26,  28,  30
     4,   5,   1,   1,   1,   1,   1,   2,   4,   3,   2,   2,   2
       1,  -4,   0,   0,   0,   0,   1,   2,  -1,  -1,   0,   0

我们看到,只要第一行有连接倍数,第三行就会有零。如果我们在数学上考虑它,函数线性部分的二阶导数也是零。所以让我们使用这个属性......

列表lst的“二阶导数”可以像这样计算

lst[i+2]-2*lst[i+1]+lst[i]

请注意,这个二阶差分的定义“看起来”前面有两个索引。 现在这里是检测连续倍数的代码:

from itertools import groupby
# We have to keep track of the  indexes in the list, that have already been used
available_indexes = set(range(len(lst)))
for second_order_diff, grouper in groupby(range(len(lst)-2), key = lambda i: lst[i+2]-2*lst[i+1]+lst[i]):
    # store all not-consumed indexes in a list
    grp_indexes = [i for i in grouper if i in available_indexes]

    if grp_indexes  and second_order_diff == 0:
        # There are consecutive multiples
        min_index, max_index = grp_indexes[0], grp_indexes[-1] + 2
        print "Group from ", lst[min_index], "to", lst[max_index], "by", lst[min_index+1]-lst[min_index]
        available_indexes -= set(range(min_index, max_index+1))
    else:
        # The not "consumed" indexes in this group are not consecutive
        for i in grp_indexes:
            print lst[i]
            available_indexes.discard(i)
# The last two elements could be lost without the following two lines
for i in sorted(available_indexes):
    print lst[i]

<强>输出:

1
5
Group from  10 to 15 by 1
17
21
Group from  24 to 30 by 2