动态生成列表中列表的元素

时间:2017-07-17 22:20:56

标签: python generator sequence-generators

我有一个列表,它由以下元素组成,

list1 = [a1,a2,a3]

此列表的每个元素本身可以是可变大小列表,例如

a1 = [x1,y1,z1], a2 = [w2,x2,y2,z2], a3 = [p3,r3,t3,n3]

我可以直接设置一个循环通过list1的生成器,并生成每个元素的组成部分;

array = []
for i in list1:
    for j in i:
        array.append[j]
        yield array

但是,有没有办法这样做,所以我可以指定数组的大小?

例如 - 批量大小为2;

1st yield : [x1,y1]
2nd yield : [z1,w1]
3rd yield : [x2,y2]
4th yield : [z2,p3]
5th yield : [r3,t3]
6th yield : [n3]
7th yield : repeat 1st

或批量大小为4;

1st yield : [x1,y1,z1,w1]
2nd yield : [x2,y2,z2,p3]
3rd yield : [r3,t3,n3]
4th yield : repeat first

对于不同大小的列表执行此操作似乎并非易事,每个列表中包含其他不同大小的列表。

4 个答案:

答案 0 :(得分:6)

这很简单,实际上,使用itertools

>>> a1 = ['x1','y1','z1']; a2 = ['w2','x2','y2','z2']; a3 = ['p3','r3','t3','n3']
>>> list1 = [a1,a2,a3]
>>> from itertools import chain, islice
>>> flatten = chain.from_iterable
>>> def slicer(seq, n):
...     it = iter(seq)
...     return lambda: list(islice(it,n))
...
>>> def my_gen(seq_seq, batchsize):
...     for batch in iter(slicer(flatten(seq_seq), batchsize), []):
...         yield batch
...
>>> list(my_gen(list1, 2))
[['x1', 'y1'], ['z1', 'w2'], ['x2', 'y2'], ['z2', 'p3'], ['r3', 't3'], ['n3']]
>>> list(my_gen(list1, 4))
[['x1', 'y1', 'z1', 'w2'], ['x2', 'y2', 'z2', 'p3'], ['r3', 't3', 'n3']]

注意,我们可以在Python 3.3 +中使用yield from

>>> def my_gen(seq_seq, batchsize):
...   yield from iter(slicer(flatten(seq_seq), batchsize), [])
...
>>> list(my_gen(list1,2))
[['x1', 'y1'], ['z1', 'w2'], ['x2', 'y2'], ['z2', 'p3'], ['r3', 't3'], ['n3']]
>>> list(my_gen(list1,3))
[['x1', 'y1', 'z1'], ['w2', 'x2', 'y2'], ['z2', 'p3', 'r3'], ['t3', 'n3']]
>>> list(my_gen(list1,4))
[['x1', 'y1', 'z1', 'w2'], ['x2', 'y2', 'z2', 'p3'], ['r3', 't3', 'n3']]
>>>

答案 1 :(得分:5)

您可以在此使用itertools,在您的情况下,我会使用chainislice

import itertools
a1 = ['x1','y1','z1']
a2 = ['w2','x2','y2','z2'] 
a3 = ['p3','r3','t3','n3']
list1 = [a1,a2,a3]

def flatten_and_batch(lst, size):
    it = itertools.chain.from_iterable(lst)
    while True:
        res = list(itertools.islice(it, size))
        if not res:
            break
        else:
            yield res

list(flatten_and_batch(list1, 2))
# [['x1', 'y1'], ['z1', 'w2'], ['x2', 'y2'], ['z2', 'p3'], ['r3', 't3'], ['n3']]

list(flatten_and_batch(list1, 3))
# [['x1', 'y1', 'z1'], ['w2', 'x2', 'y2'], ['z2', 'p3', 'r3'], ['t3', 'n3']]

如果你不介意额外的依赖,你也可以使用iteration_utilities.grouper(尽管它返回元组而不是列表) 1

from iteration_utilities import flatten, grouper, Iterable

>>> list(grouper(flatten(list1), 2))
[('x1', 'y1'), ('z1', 'w2'), ('x2', 'y2'), ('z2', 'p3'), ('r3', 't3'), ('n3',)]

>>> list(grouper(flatten(list1), 3))
[('x1', 'y1', 'z1'), ('w2', 'x2', 'y2'), ('z2', 'p3', 'r3'), ('t3', 'n3')]

iteration_utilities.Iterable

>>> Iterable(list1).flatten().grouper(3).as_list()
[('x1', 'y1', 'z1'), ('w2', 'x2', 'y2'), ('z2', 'p3', 'r3'), ('t3', 'n3')]

>>> Iterable(list1).flatten().grouper(4).map(list).as_list()
[['x1', 'y1', 'z1', 'w2'], ['x2', 'y2', 'z2', 'p3'], ['r3', 't3', 'n3']]

1 免责声明:我是该图书馆的作者。

时序:

enter image description here

from itertools import chain, islice
flatten = chain.from_iterable
from iteration_utilities import flatten, grouper, Iterable

def slicer(seq, n):
    it = iter(seq)
    return lambda: list(islice(it,n))

def my_gen(seq_seq, batchsize):
    for batch in iter(slicer(flatten(seq_seq), batchsize), []):
        yield batch

def flatten_and_batch(lst, size):
    it = flatten(lst)
    while True:
        res = list(islice(it, size))
        if not res:
            break
        else:
            yield res

def iteration_utilities_approach(seq, size):
    return grouper(flatten(seq), size)

def partition(lst, c):
    all_elem = list(chain.from_iterable(lst))
    for k in range(0, len(all_elem), c):
        yield all_elem[k:k+c]


def juanpa(seq, size):
    return list(my_gen(seq, size))    
def mseifert1(seq, size):
    return list(flatten_and_batch(seq, size))   
def mseifert2(seq, size):
    return list(iteration_utilities_approach(seq, size))   
def JoelCornett(seq, size):
    return list(partition(seq, size))       

# Timing setup
timings = {juanpa: [], 
           mseifert1: [], 
           mseifert2: [], 
           JoelCornett: []}

sizes = [2**i for i in range(1, 18, 2)]

# Timing
for size in sizes:
    print(size)
    func_input = [['x1','y1','z1']]*size
    for func in timings:
        print(str(func))
        res = %timeit -o func(func_input, 3)
        timings[func].append(res)

%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(1)
ax = plt.subplot(111)

for func in timings:
    ax.plot(sizes, 
            [time.best for time in timings[func]], 
            label=str(func.__name__))
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()

答案 2 :(得分:2)

如果将任务分为两个步骤,这是相对微不足道的:

  1. 压扁清单。
  2. 根据批量大小发出块。
  3. 以下是一个示例实现:

    from itertools import chain
    
    def break_into_batches(items, batch_size):
        flattened = list(chain(*items))
        for i in range(0, len(flattened), batch_size):
            yield flattened[i:i+batch_size]
    

答案 3 :(得分:0)

鉴于以下目标适用于列表

  1. 产生批次,每个给定size
  2. 重复此过程一些cycles
  3. more_itertools可以实现以下目标:

    import more_itertools as mit
    
    
    def batch(iterable, size=2, cycles=1):
        """Yield resized batches of an iterable."""
        iterable = mit.ncycles(iterable, cycles)
        return mit.chunked(mit.flatten(iterable), size)
    
    list(batch(list1, 3))
    # [["x1", "y1", "z1"], ["w2", "x2", "y2"], ["z2", "p3", "r3"], ["t3", "n3"]]
    
    
    list(batch(list1, size=3, cycles=2))
    # [["x1", "y1", "z1"], ["w2", "x2", "y2"], ["z2", "p3", "r3"],
    #  ["t3", "n3", "x1"], ["y1", "z1", "w2"], ["x2", "y2", "z2"],
    #  ["p3", "r3", "t3"], ["n3"]]
    

    有关每个工具ncyclesflattenchucked的详细信息,请参阅文档。