避免生成器分离器生成无值

时间:2016-12-20 13:56:51

标签: python python-3.x split generator

我从CSV文件中提取了数千个ID(现在它是ID的生成器)来迭代和处理这些ID。

为优化代码,我已将这些ID分组并一次处理整批。

以下代码分区 - 以n。

的批量大小来处理生成器
from itertools import zip_longest
def grouper(n, iterable):
    """ Grouping of iterable with n objects
       Attributes
       :n No. of values in a group
       :iterable/string to be iterated
       :return group of string/iterator values
    "grouper(3, 'abcdefg') --> ('a','b','c'), ('d','e','f'), ('g',None, None)"
    """
    return zip_longest(*[iter(iterable)]*n)

例如:

>>>acc_ids = ['ID21', 'ID24', 'ID38', 'ID40', 'ID42', 'ID43', 'ID47', 'ID54', 'ID58']
#--As an iterator
>>>id_generator = (i for i in acc_ids)
>>>batches = grouper(7, id_generator)
>>>batches
<itertools.zip_longest object at 0x7f3beb3313b8>
#This iterator is much similar to the below list and notice padded `None`(s) at the end of last batch:
#[('ID21', 'ID24', 'ID38', 'ID40', 'ID42', 'ID43', 'ID47'), ('ID54', 'ID58', None, None, None, None, None)]

问题是,要从迭代器中删除填充的None值,我正在使用filter

for batch in batches:
    batch = list(filter(None, batch))

此过滤器正在从列表中删除None值。因为我在考虑,而不是添加额外的过滤器,我们可以防止在分割生成器时产生填充的None值...

查询:

  • 是否有其他方法可以分割大型发电机来生产批次 不在最后一批末尾添加无/空值 OR
  • 我们可以更改上面的函数grouper来抑制生成填充的无值吗?

2 个答案:

答案 0 :(得分:3)

这可能对您有用:

def grouper(n, iterable):
    iter_ = iter(iterbale)
    while True:
        res = tuple(next(iter_) for _ in range(n))
        if not res:
            return
        yield res


acc_ids = ['ID21', 'ID24', 'ID38', 'ID40', 'ID42', 'ID43', 'ID47', 'ID54', 'ID58']
id_generator = iter(acc_ids)
batches = grouper(7, id_generator)
print(list(batches))

输出:

[('ID21', 'ID24', 'ID38', 'ID40', 'ID42', 'ID43', 'ID47'), ('ID54', 'ID58')]

答案 1 :(得分:1)

一种可能性是使用已包含此类功能的外部库:

  • iteration_utilities.grouper

    def create
    
      @gigdates = params[:gig][:date].split(';')
        @gigdates.each do |date|
           @gig = Gig.new(gig_params)
           @gig.date = date
           @genres = Genre.where(:id => params[:choose_genres])
           @gig.genres << @genres
           @gig.save
         end
    
        redirect_to @gig
      end
    
  • more-itertools.chunked

    >>> from iteration_utilities import grouper
    >>> list(grouper(acc_ids, 7))
    [('ID21', 'ID24', 'ID38', 'ID40', 'ID42', 'ID43', 'ID47'), ('ID54', 'ID58')]
    
  • pytoolz.partition_all>>> from more_itertools import chunked >>> list(chunked(acc_ids, 7)) [['ID21', 'ID24', 'ID38', 'ID40', 'ID42', 'ID43', 'ID47'], ['ID54', 'ID58']]

    cytoolz.partition_all

这些库都有自由许可证(Apache,MIT和BSD),所以即使您不想要依赖关系,也可能只是重用它们的代码(您可能需要在代码中包含它们的许可证,请参阅其许可证以进一步查看详情)。