我有一个排序列表,想要在该列表中标识连续的多个数字。该列表可以包含不同顺序的连续倍数,这使得它更加困难。
一些测试用例:
[1,3,4,5] -> [[1], [3,4,5]]
[1,3,5,6,7] -> [[1], [3], [5,6,7]]
# consecutive multiples of 1 and 2 (or n)
[1,2,3,7,9,11] -> [[1,2,3], [7,9,11]
[1,2,3,7,10,12,14,25] -> [[1,2,3], [7], [10,12,14], [25]]
# overlapping consecutives !!!
[1,2,3,4,6,8,10] -> [[1,2,3,4], [6,8,10]
现在,我不知道我在做什么。我所做的是按照数字之间的距离成对分组,这是一个好的开始,但是我有很多问题需要确定每对中的哪个元素在哪里,即
# initial list
[1,3,4,5]
# pairs of same distance
[[1,3], [[3,4], [4,5]]
# algo to get the final result ?
[[1], [3,4,5]]
非常感谢任何帮助。
编辑:也许提一下我想要的东西会更清楚。
我希望改变:
[1,5,10,11,12,13,14,15,17,20,22,24,26,28,30]
到
1, 5, 10 to 15 by 1, 17, 20 to 30 by 2
答案 0 :(得分:1)
我会从差异列表开始。
length_a = len(list1)
diff_v = [list1[j+1] - list1[j] for j in range(length_a-1)]
所以[1,2,3,7,11,13,15,17]成为[1,1,4,4,2,2,2]
现在很容易
答案 1 :(得分:1)
这是一个包含@ Bakuriu优化的版本:
MINIMAL_MATCH = 3
def find_some_sort_of_weird_consecutiveness(data):
"""
>>> find_some_sort_of_weird_consecutiveness([1,3,4,5])
[[1], [3, 4, 5]]
>>> find_some_sort_of_weird_consecutiveness([1,3,5,6,7])
[[1, 3, 5], [6], [7]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,7,9,11])
[[1, 2, 3], [7, 9, 11]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,7,10,12,14,25])
[[1, 2, 3], [7], [10, 12, 14], [25]]
>>> find_some_sort_of_weird_consecutiveness([1,2,3,4,6,8,10])
[[1, 2, 3, 4], [6, 8, 10]]
>>> find_some_sort_of_weird_consecutiveness([1,5,10,11,12,13,14,15,17,20,22,24,26,28,30])
[[1], [5], [10, 11, 12, 13, 14, 15], [17], [20, 22, 24, 26, 28, 30]]
"""
def pair_iter(series):
from itertools import tee
_first, _next = tee(series)
next(_next, None)
for i, (f, n) in enumerate(zip(_first, _next), start=MINIMAL_MATCH - 1):
yield i, f, n
result = []
while len(data) >= MINIMAL_MATCH:
test = data[1] - data[0]
if (data[2] - data[1]) == test:
for i, f, n in pair_iter(data):
if (n - f) != test:
i -= 1
break
else:
i = 1
data, match = data[i:], data[:i]
result.append(match)
for d in data:
result.append([d])
return result
if __name__ == '__main__':
from doctest import testmod
testmod()
它处理您当前的所有测试用例。如果你有任何测试用例,请给我新的失败测试用例。
正如下面的评论所述,我假设最短的序列现在是三个元素,因为两个序列是微不足道的。
有关成对迭代器的解释,请参阅http://docs.python.org/2/library/itertools.html。
答案 2 :(得分:0)
您可以随时跟踪上一个输出值:
in_ = [1, 2, 3, 4, 5]
out = [[in[0]]]
for item in in_[1:]:
if out[-1][-1] != item - 1:
out.append([])
out[-1].append(item)
答案 3 :(得分:0)
我会根据索引和值之间的差异对列表进行分组:
from itertools import groupby
lst = [1,3,4,5]
result = []
for key, group in groupby(enumerate(lst), key = lambda (i, value): value - i):
result.append([value for i, value in group])
print result
[[1], [3, 4, 5]]
我做了什么?
# at first I enumerate every item of list:
print list(enumerate(lst))
[(0, 1), (1, 3), (2, 4), (3, 5)]
# Then I subtract the index of each item from the item itself:
print [ value - i for i, value in enumerate(lst)]
[1, 2, 2, 2]
# As you see, consecutive numbers turn out to have the same difference between index and value
# We can use this feature and group the list by the difference of value minus index
print list( groupby(enumerate(lst), key = lambda (i, value): value - i) )
[(1, <itertools._grouper object at 0x104bff050>), (2, <itertools._grouper object at 0x104bff410>)]
# Now you can see how it works. Now I just want to add how to write this in one logical line:
result = [ [value for i, value in group]
for key, group in groupby(enumerate(lst), key = lambda (i, value): value - i)]
print result
[[1], [3, 4, 5]]
我们来看看这个清单,
lst = [1,5,10,11,12,13,14,15,17,21,24,26,28,30]
尤其是相邻元素之间的差异以及三个连续元素的差异差异:
1, 5, 10, 11, 12, 13, 14, 15, 17, 21, 24, 26, 28, 30
4, 5, 1, 1, 1, 1, 1, 2, 4, 3, 2, 2, 2
1, -4, 0, 0, 0, 0, 1, 2, -1, -1, 0, 0
我们看到,只要第一行有连接倍数,第三行就会有零。如果我们在数学上考虑它,函数线性部分的二阶导数也是零。所以让我们使用这个属性......
列表lst
的“二阶导数”可以像这样计算
lst[i+2]-2*lst[i+1]+lst[i]
请注意,这个二阶差分的定义“看起来”前面有两个索引。 现在这里是检测连续倍数的代码:
from itertools import groupby
# We have to keep track of the indexes in the list, that have already been used
available_indexes = set(range(len(lst)))
for second_order_diff, grouper in groupby(range(len(lst)-2), key = lambda i: lst[i+2]-2*lst[i+1]+lst[i]):
# store all not-consumed indexes in a list
grp_indexes = [i for i in grouper if i in available_indexes]
if grp_indexes and second_order_diff == 0:
# There are consecutive multiples
min_index, max_index = grp_indexes[0], grp_indexes[-1] + 2
print "Group from ", lst[min_index], "to", lst[max_index], "by", lst[min_index+1]-lst[min_index]
available_indexes -= set(range(min_index, max_index+1))
else:
# The not "consumed" indexes in this group are not consecutive
for i in grp_indexes:
print lst[i]
available_indexes.discard(i)
# The last two elements could be lost without the following two lines
for i in sorted(available_indexes):
print lst[i]
<强>输出:强>
1
5
Group from 10 to 15 by 1
17
21
Group from 24 to 30 by 2