可能重复:
How do you split a list into evenly sized chunks in Python?
我很惊讶我找不到一个“批处理”函数,它将输入迭代并返回一个可迭代的迭代。
例如:
for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]
或:
for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]
现在,我写了一个我认为非常简单的生成器:
def batch(iterable, n = 1):
current_batch = []
for item in iterable:
current_batch.append(item)
if len(current_batch) == n:
yield current_batch
current_batch = []
if current_batch:
yield current_batch
但上述内容并未给出我的预期:
for x in batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]
所以,我错过了一些东西,这可能表明我完全缺乏对python生成器的理解。有人会关心我指向正确的方向吗?
[编辑:我最终意识到只有当我在ipython而不是python本身中运行时才会发生上述行为]
答案 0 :(得分:83)
这可能更有效(更快)
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
for x in batch(range(0, 10), 3):
print x
它避免构建新列表。
答案 1 :(得分:31)
FWIW,recipes in the itertools module提供了这个例子:
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
它的工作原理如下:
>>> list(grouper(3, range(10)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
答案 2 :(得分:25)
正如其他人所说,您提供的代码完全符合您的要求。对于使用itertools.islice
的另一种方法,您可以看到以下配方的example:
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([batchiter.next()], batchiter)
答案 3 :(得分:7)
很奇怪,在Python 2.x中似乎对我很好。
>>> def batch(iterable, n = 1):
... current_batch = []
... for item in iterable:
... current_batch.append(item)
... if len(current_batch) == n:
... yield current_batch
... current_batch = []
... if current_batch:
... yield current_batch
...
>>> for x in batch(range(0, 10), 3):
... print x
...
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
答案 4 :(得分:6)
针对Python 3.8的解决方案,如果您使用的是未定义len
函数的可迭代对象,并且精疲力竭:
def batcher(iterable, batch_size):
while batch := list(islice(iterable, batch_size)):
yield batch
用法示例:
def my_gen():
yield from range(10)
for batch in batcher(my_gen(), 3):
print(batch)
>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]
当然也可以不使用海象运算符。
答案 5 :(得分:3)
通过利用 islice 和 iter(callable) 行为,尽可能多地使用 CPython:
from itertools import islice
def chunked(generator, size):
"""Read parts of the generator, pause each time after a chunk"""
# islice returns results until 'size',
# make_chunk gets repeatedly called by iter(callable).
gen = iter(generator)
make_chunk = lambda: list(islice(gen, size))
return iter(make_chunk, [])
受到 more-itertools 的启发,并简化为该代码的本质。
答案 6 :(得分:1)
这是使用reduce
函数的方法。
Oneliner:
from functools import reduce
reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])
或更易读的版本:
from functools import reduce
def batch(input_list, batch_size):
def reducer(cumulator, item):
if len(cumulator[-1]) < batch_size:
cumulator[-1].append(item)
return cumulator
else:
cumulator.append([item])
return cumulator
return reduce(reducer, input_list, [[]])
测试:
>>> batch([1,2,3,4,5,6,7], 3)
[[1, 2, 3], [4, 5, 6], [7]]
>>> batch(a, 8)
[[1, 2, 3, 4, 5, 6, 7]]
>>> batch([1,2,3,None,4], 3)
[[1, 2, 3], [None, 4]]
答案 7 :(得分:1)
一个可行的版本,没有python 3.8的新功能,改编自@Atra Azami的答案。
import itertools
def batch_generator(iterable, batch_size=1):
iterable = iter(iterable)
while True:
batch = list(itertools.islice(iterable, batch_size))
if len(batch) > 0:
yield batch
else:
break
for x in batch_generator(range(0, 10), 3):
print(x)
输出:
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
答案 8 :(得分:0)
我喜欢这个,
def batch(x, bs):
return [x[i:i+bs] for i in range(0, len(x), bs)]
这将返回大小为 bs
的批次列表,当然您可以使用生成器表达式 (i for i in iterable)
将其设为生成器。
答案 9 :(得分:0)
此代码具有以下特点:
<块引用>def batch_generator(items, batch_size):
itemid=0 # Keeps track of current position in items generator/list
batch = [] # Empty batch
for item in items:
batch.append(item) # Append items to batch
if len(batch)==batch_size:
yield batch
itemid += batch_size # Increment the position in items
batch = []
yield batch # yield last bit
答案 10 :(得分:0)
保留(最多)n个元素,直到用完为止。
def chop(n, iterable):
iterator = iter(iterable)
while chunk := list(take(n, iterator)):
yield chunk
def take(n, iterable):
iterator = iter(iterable)
for i in range(n):
try:
yield next(iterator)
except StopIteration:
return
答案 11 :(得分:0)
我用
def batchify(arr, batch_size):
num_batches = math.ceil(len(arr) / batch_size)
return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
答案 12 :(得分:0)
from itertools import *
class SENTINEL: pass
def batch(iterable, n):
return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))
print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]
答案 13 :(得分:0)
您可能需要的相关功能:
def batch(size, i):
""" Get the i'th batch of the given size """
return slice(size* i, size* i + size)
用法:
>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)]
>>> [4, 5, 6]
它从序列中获得第i个批处理,并且还可以与其他数据结构一起使用,例如pandas数据帧(df.iloc[batch(100,0)]
)或numpy数组(array[batch(100,0)]
)。
答案 14 :(得分:0)
您可以按照可批处理项目的批次索引对其进行分组。
def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
# enumerate items and group them by batch index
enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
# extract items from enumeration tuples
item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
return item_batches
通常情况下,您希望收集内部可迭代对象,因此这里是更高级的版本。
def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
if batches_mapper:
item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
else:
item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
return item_batches
示例:
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple)))
# [(1, 9, 3, 5), (2, 4, 2)]
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list)))
# [[1, 9, 3, 5], [2, 4, 2]]
答案 15 :(得分:0)
def batch(iterable, n):
iterable=iter(iterable)
while True:
chunk=[]
for i in range(n):
try:
chunk.append(next(iterable))
except StopIteration:
yield chunk
return
yield chunk
list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
答案 16 :(得分:0)
我只给出了一个答案。但是,现在我觉得最好的解决方案可能是不编写任何新函数。 More-itertools包括大量其他工具,其中chunked
就是其中之一。
答案 17 :(得分:0)
我知道这是一个很短的代码片段(不是我的创作),它不使用len
,并且可以在Python 2和3(不是我的创作)下使用:
def chunks(iterable, size):
from itertools import chain, islice
iterator = iter(iterable)
for first in iterator:
yield list(chain([first], islice(iterator, size - 1)))
答案 18 :(得分:0)
这将适用于任何迭代。
from itertools import zip_longest, filterfalse
def batch_iterable(iterable, batch_size=2):
args = [iter(iterable)] * batch_size
return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))
它会像这样工作:
>>>list(batch_iterable(range(0,5)), 2)
[(0, 1), (2, 3), (4,)]
PS:如果Iterable具有None值,则它将不起作用。
答案 19 :(得分:0)
这就是我在项目中使用的。它尽可能有效地处理可迭代项或列表。
def chunker(iterable, size):
if not hasattr(iterable, "__len__"):
# generators don't have len, so fall back to slower
# method that works with generators
for chunk in chunker_gen(iterable, size):
yield chunk
return
it = iter(iterable)
for i in range(0, len(iterable), size):
yield [k for k in islice(it, size)]
def chunker_gen(generator, size):
iterator = iter(generator)
for first in iterator:
def chunk():
yield first
for more in islice(iterator, size - 1):
yield more
yield [k for k in chunk()]