删除列表中的某些连续重复项

时间:2011-03-26 15:24:46

标签: python list duplicates

我有一个像这样的字符串列表:

['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']

我想用'**', '**'替换'**',但保持'bar', 'bar'完好无损。即用一个'**'替换任何连续数字p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz'] np = [p[0]] for pi in range(1,len(p)): if p[pi] == '**' and np[-1] == '**': continue np.append(p[pi]) 。我目前的代码如下:

{{1}}

还有更多的pythonic方法吗?

6 个答案:

答案 0 :(得分:5)

不确定pythonic,但这应该有效并且更简洁:

star_list = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']
star_list = [i for i, next_i in zip(star_list, star_list[1:] + [None]) 
             if (i, next_i) != ('**', '**')]

以上复制列表两次;如果你想避免这种情况,请考虑Tom Zych的方法。或者,您可以执行以下操作:

from itertools import islice, izip, chain

star_list = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']
sl_shift = chain(islice(star_list, 1, None), [None])
star_list = [i for i, next_i in izip(star_list, sl_shift) 
             if (i, next_i) != ('**', '**')]

这可以推广并使迭代器更友好 - 更不用说更具可读性 - 使用pairwise文档中itertools配方的变体:

from itertools import islice, izip, chain, tee
def compress(seq, x):
    seq, shift = tee(seq)
    shift = chain(islice(shift, 1, None), (object(),))
    return (i for i, j in izip(seq, shift) if (i, j) != (x, x))

测试:

>>> list(compress(star_list, '**'))
['**', 'foo', '*', 'bar', 'bar', '**', 'baz']

答案 1 :(得分:3)

这在我看来是pythonic

result = [v for i, v in enumerate(L) if L[i:i+2] != ["**", "**"]]

正在使用的唯一“技巧”是L[i:i+2]i == len(L)-1时一个元素的列表。

请注意,当然同一个表达式也可以用作生成器

答案 2 :(得分:1)

这很有效。不确定Pythonic是怎么回事。

import itertools

p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']

q = []
for key, iter in itertools.groupby(p):
    q.extend([key] * (1 if key == '**' else len(list(iter))))

print(q)

答案 3 :(得分:1)

from itertools import groupby

p = ['**', 'foo', '*', 'bar', 'bar', '**', '**', 'baz']
keep = set(['foo',  'bar', 'baz'])
result = []

for k, g in groupby(p):
    if k in keep:
        result.extend(list(g))
    else:
        result.append(k)

答案 4 :(得分:1)

答案 5 :(得分:1)

一个通用的“pythonic”解决方案,适用于任何可迭代的(没有备份,没有复制,没有索引,没有切片,如果iterable为空则不会失败)和任何事物 - 到 - 挤压(包括无):

>>> test = ['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**',
...      'foo', '*','*', 'bar', 'bar','bar', '**', '**','foo','bar',]
>>>
>>> def squeeze(iterable, victim, _dummy=object()):
...     previous = _dummy
...     for item in iterable:
...         if item == victim == previous: continue
...         previous = item
...         yield item
...
>>> print test
['**', 'foo', '*', 'bar', 'bar', '**', '**', '**', 'baz', '**', '**', 'foo', '*'
, '*', 'bar', 'bar', 'bar', '**', '**', 'foo', 'bar']
>>> print list(squeeze(test, "**"))
['**', 'foo', '*', 'bar', 'bar', '**', 'baz', '**', 'foo', '*', '*', 'bar', 'bar
', 'bar', '**', 'foo', 'bar']
>>> print list(squeeze(["**"], "**"))
['**']
>>> print list(squeeze(["**", "**"], "**"))
['**']
>>> print list(squeeze([], "**"))
[]
>>>

更新,以了解@ victim无法成为序列(或者可能是一组)的@eyquem的启示。

拥有受害者容器意味着有两种可能的语义:

>>> def squeeze2(iterable, victims, _dummy=object()):
...     previous = _dummy
...     for item in iterable:
...         if item == previous in victims: continue
...         previous = item
...         yield item
...
>>> def squeeze3(iterable, victims, _dummy=object()):
...     previous = _dummy
...     for item in iterable:
...         if item in victims and previous in victims: continue
...         previous = item
...         yield item
...
>>> guff = "c...d..e.f,,,g,,h,i.,.,.,.j"
>>> print "".join(squeeze2(guff, ".,"))
c.d.e.f,g,h,i.,.,.,.j
>>> print "".join(squeeze3(guff, ".,"))
c.d.e.f,g,h,i.j
>>>