例如,列表to_be
包含:"a"
中的3个,"b"
中的4个,"c"
中的3个,"d"
中的5个... < / p>
to_be = ["a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "d", "d", "d", "d", "d", ...]
现在我希望它像这样:
done = ["a", "b", "c", "d", ... , "a", "b", "c", "d", ... , "b", "d", ...] (notice: some items are more than others as in amounts, but they need to be still in a pre-defined order, alphabetically for example)
最快的方法是什么?
答案 0 :(得分:12)
假设我理解你想要的东西,可以通过合并itertools.zip_longest
,itertools.groupby
和itertools.chain.from_iterable()
来相对轻松地完成:
我们首先将这些项目分组("a"
s,"b"
等等),我们将它们压缩起来,按照您想要的顺序(每组一个) ),使用chain生成单个列表,然后删除压缩引入的None
值。
>>> [item for item in itertools.chain.from_iterable(itertools.zip_longest(*[list(x) for _, x in itertools.groupby(to_be)])) if item]
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']
您可能希望将某些list comprehensions分开以使其更具可读性,但是:
>>> groups = itertools.zip_longest(*[list(x) for _, x in itertools.groupby(to_be)])
>>> [item for item in itertools.chain.from_iterable(groups) if item]
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']
(给定版本为3.x,对于2.x,您需要izip_longest()
。)
与往常一样,如果您期望空字符串,0等等...那么您将需要if item is not None
,如果您需要保持None
值,请创建一个标记对象,检查身份。
您还可以使用文档中提供的the roundrobin()
recipe作为压缩的替代方法,这使其简单如下:
>>> list(roundrobin(*[list(x) for _, x in itertools.groupby(to_be)]))
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']
作为最后一点,观察者可能会注意到我从groupby()
生成器制作列表,这看起来很浪费,原因来自the docs:
返回的组本身就是一个共享底层的迭代器 可以使用groupby()进行迭代。因为源是共享的,所以 groupby()对象是高级的,前一个组不再可见。 因此,如果以后需要该数据,则应将其存储为列表。
答案 1 :(得分:2)
to_be = ["a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "d", "d", "d", "d", "d"]
counts = collections.Counter(to_be)
answer = []
while counts:
answer.extend(sorted(counts))
for k in counts:
counts[k] -= 1
counts = {k:v for k,v in counts.iteritems() if v>0}
现在,answer
看起来像这样:
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']
希望这有帮助
答案 2 :(得分:1)
我不确定这是否最快,但这是我的抨击:
>>> d = defaultdict(int)
>>> def sort_key(a):
... d[a] += 1
... return d[a],a
...
>>> sorted(to_be,key=sort_key)
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'b', 'd', 'd']
包含在一个函数中:
def weird_sort(x):
d = defaultdict(int)
def sort_key(a):
d[a] += 1
return (d[a],a)
return sorted(x,key=sort_key)
当然,这要求你的iterable中的元素是可以删除的。
答案 3 :(得分:0)
比Lattyware的优雅一点:
import collections
def rearrange(l):
counts = collections.Counter(l)
output = []
while (sum([v for k,v in counts.items()]) > 0):
output.extend(sorted([k for k, v in counts.items() if v > 0))
for k in counts:
counts[k] = counts[k] - 1 if counts[k] > 0 else 0
return counts
答案 4 :(得分:0)
“手动和状态机械”这样做应该更有效率 - 但是对于相对较小的列表(<5000),你应该没有任何问题 Python好东西这样做:
to_be = ["a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "d", "d", "d", "d", "d","e", "e"]
def do_it(lst):
lst = lst[:]
result = []
while True:
group = set(lst)
result.extend(sorted(group))
for element in group:
del lst[lst.index(element)]
if not lst:
break
return result
done = do_it(to_be)
上述功能的“大O”复杂度应该非常大。我没有事情想知道它。