Question

我有一个整数列表如下：

my_list = [2,2,2,2,3,4,2,2,4,4,3]

我想要的是将它作为一个列表，字符串，索引和＆＃39;压缩＆＃39;，即每个元素由它在列表中的位置指示，每个连续的重复元素表示为一个范围，像这样：

my_new_list = ['0-3,2', '4,3', '5,4', '6-7,2', '8-9,4', '10,3']

编辑：预期输出应指示列表元素0到3具有数字2，元素3，数字3，元素5，数字4，元素6和7，数字2，元素8和9，数字4，元素10，数字3.

编辑2：输出列表不需要（实际上不能）是整数列表，而是字符串列表。

我可以找到许多从列表中查找（和删除）重复元素的示例，但没有任何内容符合我的需要。

有人可以指出相关的例子或建议解决这个问题的算法吗？

提前致谢！

Answer 1

与大多数涉及级联连续重复的问题一样，您仍然可以使用groupby()。只需按每个索引的值对索引进行分组。

values = [2,2,2,2,3,4,2,2,4,4,3]
result = []

for key, group in itertools.groupby(range(len(values)), values.__getitem__):
    indices = list(group)

    if len(indices) > 1:
        result.append('{}-{},{}'.format(indices[0], indices[-1], key))
    else:
        result.append('{},{}'.format(indices[0], key))

print(result)

输出：

['0-3,2', '4,3', '5,4', '6-7,2', '8-9,4', '10,3']

Answer 2

您可以使用带生成器函数的枚举

def seq(l):
    it = iter(l)
    # get first element and set the start index to 0.
    start, prev = 0, next(it)
    # use enumerate to track the rest of the indexes
    for ind, ele in enumerate(it, 1):
        # if last seen element is not the same the sequence is over
        # if start i == ind - 1 the sequence had just a single element.
        if prev != ele:
            yield ("{}-{}, {}".format(start, ind - 1, prev)) \
                if start != ind - 1 else ("{}, {}".format(start, prev))

            start = ind
        prev = ele
    yield ("{}-{}, {}".format(start-1, ind-1, prev)) \
        if start != ind  else ("{}, {}".format(start, prev))

输出：

In [3]: my_list = [2, 2, 2, 2, 3, 4, 2, 2, 4, 4, 3]

In [4]: list(seq(my_list))
Out[4]: ['0-3, 2', '4, 3', '5, 4', '6-7, 2', '8-9, 4', '10, 3']

我打算使用 groupby ，但会更快。

In [11]: timeit list(seq(my_list))
100000 loops, best of 3: 4.38 µs per loop

In [12]: timeit itools()

100000 loops, best of 3: 9.23 µs per loop

Answer 3

这是一个适用于任何序列的惰性版本，并产生切片。因此，它具有通用性和内存效率。

def compress(seq):
    start_index = 0
    previous = None
    n = 0
    for i, x in enumerate(seq):
        if previous and x != previous:
            yield previous, slice(start_index, i)
            start_index = i

        previous = x
        n += 1
    if previous:
        yield previous, slice(start_index, n)

用法：

assert list(compress([2, 2, 2, 2, 3, 4, 2, 2, 4, 4, 3])) == [
    (2, slice(0, 4)),
    (3, slice(4, 5)),
    (4, slice(5, 6)),
    (2, slice(6, 8)),
    (4, slice(8, 10)),
    (3, slice(10, 11)),
]

为什么切片？因为它很方便（可以按原样用于索引）并且语义（不包括上限）更符合标准＆＃34;。使用上限将其更改为元组或字符串很容易。

Answer 4

使用项目构造具有连续出现次数的列表。然后迭代列表并获取每个项目的索引范围的列表。

from itertools import groupby

new_list = []

for k, g in groupby([2,2,2,2,3,4,2,2,4,4,3]):
   sum_each = 0
   for i in g:
      sum_each += 1
   ##Construct the list with number of consecutive occurences with the item like this `[(4, 2), (1, 3), (1, 4), (2, 2), (2, 4), (1, 3)]`
   new_list.append((sum_each, k))

x = 0
for (n, item) in enumerate(new_list):
   if item[0] > 1:
      new_list[n] = str(x) + '-' + str(x+item[0]-1) + ',' + str(item[1])
   else:
      new_list[n] = str(x) + ',' + str(item[1])
   x += item[0]

print new_list

Answer 5

首先，您请求的结果不是有效的python。我将假设以下格式适合您：

my_new_list = [ ((0,3),2), ((4,4),3), ((5,5),4), ((6,7),2), ((8,9),4), ((10,10),3) ]

鉴于此，您可以先将my_list转换为((index,index),value)元组列表，然后使用reduce将其收集到范围内：

my_new_list = reduce(
        lambda new_list,item:
            new_list[:-1] + [((new_list[-1][0][0],item[0][1]),item[1])]
                if len(new_list) > 0 and new_list[-1][1] == item[1]
            else new_list + [item]
        , [((index,index),value) for (index,value) in enumerate(my_list)]
        , []
)

执行以下操作：

将列表转换为((index,index),value)元组：

[((index,index),value) for (index,value) in enumerate(my_list)]

使用reduce合并具有相同值的相邻项目：如果正在构建的列表至少有1个项目，并且列表中的最后一个项目与正在处理的项目具有相同的值，请将其减少到列表减去最后一项，加上一个新项目，包括最后一个列表项的第一个索引加上当前项的第二个索引和当前项的值。如果正在构建的列表为空或列表中的最后一项与正在处理的项目的值不同，则只需将当前项目添加到列表中。

编辑使用new_list代替list作为我的lambda参数;使用list作为参数或变量名是不好的形式

Answer 6

这是一个类似于Padraic的基于生成器的解决方案。但是它避免了基于enumerate()的索引跟踪，因此对于大型列表来说可能更快。我也不担心你想要的输出格式。

def compress_list(ilist):
    """Compresses a list of integers"""
    left, right = 0, 0
    length = len(ilist)
    while right < length:
        if ilist[left] == ilist[right]:
            right += 1
            continue
        yield (ilist[left], (left, right-1))
        left = right
    # at the end of the list, yield the last item
    yield (ilist[left], (left, right-1))

它会像这样使用：

my_list = [2,2,2,2,3,4,2,2,4,4,3]
my_compressed_list = [i for i in compress_list(my_list)]
my_compressed_list

导致输出：

[(2, (0, 3)),
 (3, (4, 4)),
 (4, (5, 5)),
 (2, (6, 7)),
 (4, (8, 9)),
 (3, (10, 10))]

Answer 7

这里有一些好的答案，并认为我会提供另一种选择。我们遍历数字列表并保留更新current值，该值与该值current_indicies的指标列表相关联。然后我们预测一个元素以查看连续数字与current的差异，如果是，我们继续将其添加为“压缩数字” 。

def compress_numbers(l):
    result = []
    current = None
    current_indicies = None
    for i, item in enumerate(l):
        if current != item:
            current = item
            current_indicies = [i]
        elif current == item:
            current_indicies.append(i)
        try:
            if l[i+1] != current:
                result.append(format_entry(current_indicies, current))
        except:
            result.append(format_entry(current_indicies, current))
    return result

# Helper method to format entry in the list.
def format_entry(indicies, value):
    i_range = None
    if len(indicies) > 1:
        i_range = '{}-{}'.format(indicies[0], indicies[-1])
    else:
        i_range = indicies[0]
    return '{},{}'.format(i_range, value)

示例输出：

>>> print compress_numbers([2, 2, 2, 2, 3, 4, 2, 2, 4, 4, 3])
['0-3,2', '4,3', '5,4', '6-7,2', '8-9,4', '10,3']

＆＃34;压缩＆＃34;整数列表

7 个答案: