我有一个任意长度的列表,我需要将其拆分为相同大小的块并对其进行操作。有一些明显的方法可以做到这一点,比如保留一个计数器和两个列表,当第二个列表填满时,将它添加到第一个列表并清空下一轮数据的第二个列表,但这可能非常昂贵。
我想知道是否有人对任何长度的列表都有一个很好的解决方案,例如使用发电机。
我在itertools
寻找有用的东西,但我找不到任何明显有用的东西。但是可能会错过它。
相关问题:What is the most “pythonic” way to iterate over a list in chunks?
答案 0 :(得分:2590)
这是一个产生你想要的块的生成器:
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
如果您使用的是Python 2,则应使用xrange()
代替range()
:
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in xrange(0, len(l), n):
yield l[i:i + n]
此外,你可以简单地使用列表理解而不是编写函数,尽管在命名函数中封装这样的操作是个好主意,这样你的代码就更容易理解了。 Python 3:
[l[i:i + n] for i in range(0, len(l), n)]
Python 2版本:
[l[i:i + n] for i in xrange(0, len(l), n)]
答案 1 :(得分:511)
如果你想要一些超级简单的东西:
def chunks(l, n):
n = max(1, n)
return (l[i:i+n] for i in xrange(0, len(l), n))
在Python 3.x的情况下使用range()
而不是xrange()
答案 2 :(得分:266)
直接来自(旧)Python文档(itertools的配方):
from itertools import izip, chain, repeat
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
当前版本,如J.F.Sebastian所建议的那样:
#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
我猜Guido的时间机器工作 - 工作 - 将有效 - 将再次工作。
这些解决方案有效,因为[iter(iterable)]*n
(或早期版本中的等价物)创建一个迭代器,在列表中重复n
次。 izip_longest
然后有效地执行“每个”迭代器的循环;因为这是相同的迭代器,所以每个这样的调用都会使它前进,从而导致每个这样的zip-roundrobin生成一个n
个元组的元组。
答案 3 :(得分:147)
我知道这有点旧,但我不知道为什么没有人提到numpy.array_split
:
lst = range(50)
In [26]: np.array_split(lst,5)
Out[26]:
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
答案 4 :(得分:100)
我很惊讶没有人想过使用iter
' two-argument form:
from itertools import islice
def chunk(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
演示:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
这适用于任何可迭代的并且懒惰地产生输出。它返回元组而不是迭代器,但我认为它有一定的优雅。它也没有垫;如果你想要填充,上面的一个简单的变化就足够了:
from itertools import islice, chain, repeat
def chunk_pad(it, size, padval=None):
it = chain(iter(it), repeat(padval))
return iter(lambda: tuple(islice(it, size)), (padval,) * size)
演示:
>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
与基于izip_longest
的解决方案一样,上面的总是。据我所知,没有一行或两行的itertools配方可用于可选焊盘的功能。通过结合上述两种方法,这一方法非常接近:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
if padval == _no_padding:
it = iter(it)
sentinel = ()
else:
it = chain(iter(it), repeat(padval))
sentinel = (padval,) * size
return iter(lambda: tuple(islice(it, size)), sentinel)
演示:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
我相信这是提供可选填充的最短时间段。
作为Tomasz Gandor observed,如果两个填充块遇到一长串填充值,它们将意外停止。这是一个最终的变体,以合理的方式解决这个问题:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
it = iter(it)
chunker = iter(lambda: tuple(islice(it, size)), ())
if padval == _no_padding:
yield from chunker
else:
for ch in chunker:
yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))
演示:
>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]
答案 5 :(得分:87)
这是一个可以处理任意迭代的生成器:
def split_seq(iterable, size):
it = iter(iterable)
item = list(itertools.islice(it, size))
while item:
yield item
item = list(itertools.islice(it, size))
示例:
>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
答案 6 :(得分:49)
def chunk(input, size):
return map(None, *([iter(input)] * size))
答案 7 :(得分:43)
简单而优雅
l = range(1, 1000)
print [l[x:x+10] for x in xrange(0, len(l), 10)]
或者如果您愿意:
chunks = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)]
chunks(l, 10)
答案 8 :(得分:33)
我在这个问题的duplicate中看到了最棒的Python-ish答案:
from itertools import zip_longest
a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]
您可以为任何n创建n元组。如果a = range(1, 15)
,则结果为:
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]
如果列表均匀分配,则您可以将zip_longest
替换为zip
,否则三元组(13, 14, None)
将会丢失。上面使用了Python 3。对于Python 2,请使用izip_longest
。
答案 9 :(得分:32)
这些答案中没有一个是大小均匀的块,它们最后都留下了一个小块,所以它们并没有完全平衡。如果你使用这些功能来分配工作,你已经内置了一个可能在其他人之前完成的前景,所以当其他人继续努力工作时,它会无所事事。
例如,当前的最高答案以:
结尾[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
我最后讨厌那个小矮人!
其他人,例如list(grouper(3, xrange(7)))
和chunk(xrange(7), 3)
都返回:[(0, 1, 2), (3, 4, 5), (6, None, None)]
。 None
只是填充,在我看来相当不优雅。它们不是均匀地分块迭代。
为什么我们不能更好地划分这些?
这是一个平衡的解决方案,改编自我在生产中使用的函数(在Python 3中注意将xrange
替换为range
):
def baskets_from(items, maxbaskets=25):
baskets = [[] for _ in xrange(maxbaskets)] # in Python 3 use range
for i, item in enumerate(items):
baskets[i % maxbaskets].append(item)
return filter(None, baskets)
如果你把它放到一个列表中,我创建了一个生成相同的生成器:
def iter_baskets_from(items, maxbaskets=3):
'''generates evenly balanced baskets from indexable iterable'''
item_count = len(items)
baskets = min(item_count, maxbaskets)
for x_i in xrange(baskets):
yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)]
最后,因为我看到所有上述函数都以连续的顺序返回元素(如给出的那样):
def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
'''
generates balanced baskets from iterable, contiguous contents
provide item_count if providing a iterator that doesn't support len()
'''
item_count = item_count or len(items)
baskets = min(item_count, maxbaskets)
items = iter(items)
floor = item_count // baskets
ceiling = floor + 1
stepdown = item_count % baskets
for x_i in xrange(baskets):
length = ceiling if x_i < stepdown else floor
yield [items.next() for _ in xrange(length)]
测试它们:
print(baskets_from(xrange(6), 8))
print(list(iter_baskets_from(xrange(6), 8)))
print(list(iter_baskets_contiguous(xrange(6), 8)))
print(baskets_from(xrange(22), 8))
print(list(iter_baskets_from(xrange(22), 8)))
print(list(iter_baskets_contiguous(xrange(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(xrange(26), 5))
print(list(iter_baskets_from(xrange(26), 5)))
print(list(iter_baskets_contiguous(xrange(26), 5)))
打印出来:
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]
请注意,连续的生成器以与其他两个相同的长度模式提供块,但是这些项都是按顺序排列的,并且它们是均匀划分的,因为它可以划分离散元素列表。
答案 10 :(得分:23)
如果您知道列表大小:
def SplitList(mylist, chunk_size):
return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]
如果你没有(迭代器):
def IterChunks(sequence, chunk_size):
res = []
for item in sequence:
res.append(item)
if len(res) >= chunk_size:
yield res
res = []
if res:
yield res # yield the last, incomplete, portion
在后一种情况下,如果你可以确定序列总是包含一定数量的给定大小的块(即没有不完整的最后一块),它可以以更漂亮的方式重新定义。
答案 11 :(得分:16)
例如,如果您的块大小为3,则可以执行以下操作:
zip(*[iterable[i::3] for i in range(3)])
源: http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/
当我的块大小是固定数字我可以输入时,我会使用它,例如'3',永远不会改变。
答案 12 :(得分:16)
toolz库具有partition
功能:
from toolz.itertoolz.core import partition
list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]
答案 13 :(得分:14)
我喜欢tzot和J.F.Sebastian提出的Python doc版本, 但它有两个缺点:
我在我的代码中经常使用这个:
from itertools import islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield tuple(islice(iterable, n)) or iterable.next()
更新:懒人块版本:
from itertools import chain, islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield chain([next(iterable)], islice(iterable, n-1))
答案 14 :(得分:12)
此时,我认为我们需要一个递归生成器,以防万一......
在python 2中:
def chunks(li, n):
if li == []:
return
yield li[:n]
for e in chunks(li[n:], n):
yield e
在python 3中:
def chunks(li, n):
if li == []:
return
yield li[:n]
yield from chunks(li[n:], n)
此外,在大量外星人入侵的情况下,装饰的递归生成器可能会变得方便:
def dec(gen):
def new_gen(li, n):
for e in gen(li, n):
if e == []:
return
yield e
return new_gen
@dec
def chunks(li, n):
yield li[:n]
for e in chunks(li[n:], n):
yield e
答案 15 :(得分:11)
您也可以使用get_chunks
库的utilspie
功能:
#include<queue>
#include<iostream>
struct MyPriorityQueue: std::priority_queue<int> {
decltype(c.begin()) begin() const { return c.begin(); }
decltype(c.end()) end() const { return c.end(); }
};
int main() {
MyPriorityQueue pq;
pq.push(0);
pq.push(1);
for(auto &v: pq) {
std::cout << v << std::endl;
}
}
您可以通过pip安装utilspie
:
>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]
免责声明:我是utilspie图书馆的创建者。
答案 16 :(得分:11)
我很好奇不同方法的表现,现在是:
在Python 3.5.1上测试
import time
batch_size = 7
arr_len = 298937
#---------slice-------------
print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
if not arr:
break
tmp = arr[0:batch_size]
arr = arr[batch_size:-1]
print(time.time() - start)
#-----------index-----------
print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)
#----------batches 1------------
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#----------batches 2------------
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([next(batchiter)], batchiter)
print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#---------chunks-------------
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
tmp = x
print(time.time() - start)
#-----------grouper-----------
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(iterable, n, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
tmp = x
print(time.time() - start)
<强>结果:强>
slice
31.18285083770752
index
0.02184295654296875
batches 1
0.03503894805908203
batches 2
0.22681021690368652
chunks
0.019841909408569336
grouper
0.006506919860839844
答案 17 :(得分:10)
[AA[i:i+SS] for i in range(len(AA))[::SS]]
AA是数组,SS是块大小。例如:
>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3
答案 18 :(得分:10)
代码:
def split_list(the_list, chunk_size):
result_list = []
while the_list:
result_list.append(the_list[:chunk_size])
the_list = the_list[chunk_size:]
return result_list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print split_list(a_list, 3)
结果:
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
答案 19 :(得分:7)
不调用len(),这对大型列表有用:
def splitter(l, n):
i = 0
chunk = l[:n]
while chunk:
yield chunk
i += n
chunk = l[i:i+n]
这适用于迭代:
def isplitter(l, n):
l = iter(l)
chunk = list(islice(l, n))
while chunk:
yield chunk
chunk = list(islice(l, n))
上述功能性风味:
def isplitter2(l, n):
return takewhile(bool,
(tuple(islice(start, n))
for start in repeat(iter(l))))
OR:
def chunks_gen_sentinel(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return iter(imap(tuple, continuous_slices).next,())
OR:
def chunks_gen_filter(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return takewhile(bool,imap(tuple, continuous_slices))
答案 20 :(得分:7)
def split_seq(seq, num_pieces):
start = 0
for i in xrange(num_pieces):
stop = start + len(seq[i::num_pieces])
yield seq[start:stop]
start = stop
用法:
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for seq in split_seq(seq, 3):
print seq
答案 21 :(得分:7)
另一个更明确的版本。
def chunkList(initialList, chunkSize):
"""
This function chunks a list into sub lists
that have a length equals to chunkSize.
Example:
lst = [3, 4, 9, 7, 1, 1, 2, 3]
print(chunkList(lst, 3))
returns
[[3, 4, 9], [7, 1, 1], [2, 3]]
"""
finalList = []
for i in range(0, len(initialList), chunkSize):
finalList.append(initialList[i:i+chunkSize])
return finalList
答案 22 :(得分:7)
In [48]: chunk = lambda ulist, step: map(lambda i: ulist[i:i+step], xrange(0, len(ulist), step))
In [49]: chunk(range(1,100), 10)
Out[49]:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
[41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
[61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
[71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
[91, 92, 93, 94, 95, 96, 97, 98, 99]]
答案 23 :(得分:6)
还有一个解决方案
def make_chunks(data, chunk_size):
while data:
chunk, data = data[:chunk_size], data[chunk_size:]
yield chunk
>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
... print chunk
...
[1, 2]
[3, 4]
[5, 6]
[7]
>>>
答案 24 :(得分:5)
因为这里的每个人都在谈论迭代器。 boltons
有完美的方法,称为iterutils.chunked_iter
。
from boltons import iterutils
list(iterutils.chunked_iter(list(range(50)), 11))
输出:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49]]
但是,如果你不想怜悯记忆,你可以使用旧方法,并将list
放在首位iterutils.chunked
。
答案 25 :(得分:5)
此时,我认为我们需要强制性的匿名递归功能。
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])
答案 26 :(得分:5)
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]
答案 27 :(得分:5)
考虑使用matplotlib.cbook件
例如:
import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
print s
答案 28 :(得分:5)
>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>>
Python3
答案 29 :(得分:4)
使用列表推导:
l = [1,2,3,4,5,6,7,8,9,10,11,12]
k = 5 #chunk size
print [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]]
答案 30 :(得分:4)
你可以使用numpy的array_split函数,例如,np.array_split(np.array(data), 20)
分成20个几乎相等大小的块。
要确保大小完全相等,请使用np.split
。
答案 31 :(得分:4)
def chunks(iterable,n):
"""assumes n is an integer>0
"""
iterable=iter(iterable)
while True:
result=[]
for i in range(n):
try:
a=next(iterable)
except StopIteration:
break
else:
result.append(a)
if result:
yield result
else:
break
g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
答案 32 :(得分:4)
根据this answer,最高投票的答案在最后留下了“傻瓜”。这是我的解决方案,可以实现尽可能均匀大小的块,没有任何欠幅。它基本上试图准确地选择它应该分割列表的小数点,但只是将其四舍五入到最接近的整数:
from __future__ import division # not needed in Python 3
def n_even_chunks(l, n):
"""Yield n as even chunks as possible from l."""
last = 0
for i in range(1, n+1):
cur = int(round(i * (len(l) / n)))
yield l[last:cur]
last = cur
演示:
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
[67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],
[78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],
[89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
[55, 56, 57, 58, 59, 60, 61, 62, 63],
[64, 65, 66, 67, 68, 69, 70, 71, 72],
[73, 74, 75, 76, 77, 78, 79, 80, 81],
[82, 83, 84, 85, 86, 87, 88, 89, 90],
[91, 92, 93, 94, 95, 96, 97, 98, 99]]
与排名最高的chunks
答案进行比较:
>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
[55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
[66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],
[77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],
[88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],
[99]]
>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98],
[99]]
答案 33 :(得分:4)
我意识到这个问题已经过时了(在谷歌上发现了它),但是肯定会有类似下面的内容比任何大型复杂的建议都简单明了,只使用切片:
def chunker(iterable, chunksize):
for i,c in enumerate(iterable[::chunksize]):
yield iterable[i*chunksize:(i+1)*chunksize]
>>> for chunk in chunker(range(0,100), 10):
... print list(chunk)
...
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
... etc ...
答案 34 :(得分:4)
以下是其他方法的列表:
给出
import itertools as it
import collections as ct
import more_itertools as mit
iterable = range(11)
n = 3
代码
标准库
list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {}
for i, x in enumerate(iterable):
d.setdefault(i//n, []).append(x)
list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
dd[i//n].append(x)
list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]
list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
参考
zip_longest
(related post,related post)setdefault
(排序结果需要Python 3.6 +)collections.defaultdict
(排序结果需要Python 3.6 +)more_itertools.chunked
(related posted)more_itertools.sliced
more_itertools.grouper
(related post)more_itertools.windowed
(另请参见stagger
,zip_offset
) + 一个实现itertools recipes及更多功能的第三方库。 > pip install more_itertools
答案 35 :(得分:3)
我为此专门写了一个小型图书馆,here。库的chunked
函数特别有效,因为它实现为generator,因此在某些情况下可以节省大量内存。它也不依赖于切片表示法,因此可以使用任意迭代器。
import iterlib
print list(iterlib.chunked(xrange(1, 1000), 10))
# prints [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), ...]
答案 36 :(得分:3)
我在下面有一个解决方案可行,但比解决方案更重要的是对其他方法的一些评论。首先,一个好的解决方案不应该要求一个循环通过子迭代器。如果我跑
g = paged_iter(list(range(50)), 11))
i0 = next(g)
i1 = next(g)
list(i1)
list(i0)
最后一个命令的适当输出是
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
不
[]
由于大多数基于itertools的解决方案都在这里返回。这不仅仅是关于按顺序访问迭代器的常见无聊限制。想象一下,一个消费者试图清理输入不良的数据,这些数据颠倒了5的块的适当顺序,即数据看起来像[B5,A5,D5,C5],应该看起来像[A5,B5,C5,D5](其中A5只是五个元素而不是子列表)。此消费者将查看分组函数的声明行为,并毫不犹豫地编写类似
的循环i = 0
out = []
for it in paged_iter(data,5)
if (i % 2 == 0):
swapped = it
else:
out += list(it)
out += list(swapped)
i = i + 1
如果您偷偷地假设子迭代器总是按顺序完全使用,这将产生神秘错误的结果。如果要从块中交错元素,情况会变得更糟。
其次,大量建议的解决方案隐含地依赖于迭代器具有确定性顺序(它们没有例如设置)这一事实,而使用islice的一些解决方案可能没问题让我担心。
第三,itertools石斑鱼方法有效但配方依赖于zip_longest(或zip)函数的内部行为,这些行为不属于其已发布的行为。特别是,只有在zip_longest(i0 ... in)中,下一个函数总是按顺序调用next(i0),next(i1),... next(in)才能重新启动。当石斑鱼传递同一个迭代器对象的n个副本时,它依赖于这种行为。
最后,如果你做出上面的假设,可以改进下面的解决方案,按顺序访问子迭代器并在没有这个假设的情况下完全阅读,必须隐式地(通过调用链)或显式地(通过deques或其他数据结构) )为每个subiterator存储元素。所以不要浪费时间(正如我所做的那样),假设有人可以通过一些聪明的伎俩解决这个问题。
def paged_iter(iterat, n):
itr = iter(iterat)
deq = None
try:
while(True):
deq = collections.deque(maxlen=n)
for q in range(n):
deq.append(next(itr))
yield (i for i in deq)
except StopIteration:
yield (i for i in deq)
答案 37 :(得分:3)
这是使用itertools.groupby的想法:
<html>
<head>
<script type="text/javascript" src="https://www.gstatic.com/charts/loader.js"></script>
<script type="text/javascript">
google.charts.load('current', {'packages':['corechart']});
google.charts.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['Task', 'Hours per Day'],
['Work', 11],
['Eat', 2],
['Commute', 2],
['Watch TV', 2],
['Sleep', 7]
]);
var options = {
title: 'My Daily Activities'
};
var chart = new google.visualization.PieChart(document.getElementById('piechart'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div id="piechart" style="width: 900px; height: 500px;"></div>
</body>
</html>
返回生成器生成器。如果你想要一个列表列表,只需用
替换最后一行def chunks(l, n):
c = itertools.count()
return (it for _, it in itertools.groupby(l, lambda x: next(c)//n))
返回列表清单的示例:
return [list(it) for _, it in itertools.groupby(l, lambda x: next(c)//n)]
(所以,是的,这会形成“欠幅问题”,在特定情况下这可能是也可能不是问题。)
答案 38 :(得分:3)
>>> f = lambda x, n, acc=[]: f(x[n:], n, acc+[(x[:n])]) if x else acc
>>> f("Hallo Welt", 3)
['Hal', 'lo ', 'Wel', 't']
>>>
如果你进入括号 - 我拿起了一本关于Erlang的书:)
答案 39 :(得分:3)
上面的答案(由koffein提出)有一点问题:列表总是分成相同数量的分割,而不是每个分区的项目数量相等。这是我的版本。 &#34; // chs + 1&#34;考虑到项目数量可能无法完全按分区大小分割,因此最后一个分区只会被部分填充。
# Given 'l' is your list
chs = 12 # Your chunksize
partitioned = [ l[i*chs:(i*chs)+chs] for i in range((len(l) // chs)+1) ]
答案 40 :(得分:3)
在Python 3.8中,Assignment Expressions变得非常不错:
import itertools
def batch(iterable, size):
it = iter(iterable)
while item := list(itertools.islice(it, size)):
yield item
这适用于任意迭代,而不仅仅是列表。
>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
答案 41 :(得分:3)
我认为我没有看到这个选项,所以只想添加另一个:)):
def chunks(iterable, chunk_size):
i = 0;
while i < len(iterable):
yield iterable[i:i+chunk_size]
i += chunk_size
答案 42 :(得分:3)
让r为块大小,L为初始列表,你可以这样做。
chunkL = [ [i for i in L[r*k:r*(k+1)] ] for k in range(len(L)/r)]
答案 43 :(得分:3)
def chunk(lst):
out = []
for x in xrange(2, len(lst) + 1):
if not len(lst) % x:
factor = len(lst) / x
break
while lst:
out.append([lst.pop(0) for x in xrange(factor)])
return out
答案 44 :(得分:2)
和@AaronHall一样,我来到这里寻找大致均匀大小的块。对此有不同的解释。在我的情况下,如果所需的大小是N,我希望每个组的大小都是> = N. 因此,在上述大部分中创建的孤儿应该重新分配给其他群体。
这可以使用:
完成def nChunks(l, n):
""" Yield n successive chunks from l.
Works for lists, pandas dataframes, etc
"""
newn = int(1.0 * len(l) / n + 0.5)
for i in xrange(0, n-1):
yield l[i*newn:i*newn+newn]
yield l[n*newn-newn:]
(来自Splitting a list of into N parts of approximately equal length)只需将其称为nChunks(l,l / n)或nChunks(l,floor(l / n))
答案 45 :(得分:2)
没有魔力,但简单而正确:
def chunks(iterable, n):
"""Yield successive n-sized chunks from iterable."""
values = []
for i, item in enumerate(iterable, 1):
values.append(item)
if i % n == 0:
yield values
values = []
if values:
yield values
答案 46 :(得分:2)
因为我必须做这样的事情,所以这是我的解决方案,给出了一个生成器和一个批量大小:
def pop_n_elems_from_generator(g, n):
elems = []
try:
for idx in xrange(0, n):
elems.append(g.next())
return elems
except StopIteration:
return elems
答案 47 :(得分:2)
我已经提出了没有创建temorary list对象的解决方案,它应该适用于任何可迭代对象。请注意,此版本适用于Python 2.x:
def chunked(iterable, size):
stop = []
it = iter(iterable)
def _next_chunk():
try:
for _ in xrange(size):
yield next(it)
except StopIteration:
stop.append(True)
return
while not stop:
yield _next_chunk()
for it in chunked(xrange(16), 4):
print list(it)
输出:
[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15]
[]
正如你可以看到len(iterable)%size == 0那么我们有额外的空迭代器对象。但我不认为这是一个大问题。
答案 48 :(得分:2)
In [259]: get_in_chunks = lambda itr,n: ( (v for _,v in g) for _,g in itertools.groupby(enumerate(itr),lambda (ind,_): ind/n)) In [260]: list(list(x) for x in get_in_chunks(range(30),7)) Out[260]: [[0, 1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13], [14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27], [28, 29]]
答案 49 :(得分:1)
我不喜欢按块大小分割元素的想法,例如脚本可以分为101到3个块[50,50,1]。为了我的需要,我需要按比例分割,并保持秩序相同。首先,我编写了自己的脚本,工作正常,而且非常简单。但我后来看到this answer,脚本比我的好,我推荐它。 这是我的剧本:
def proportional_dividing(N, n):
"""
N - length of array (bigger number)
n - number of chunks (smaller number)
output - arr, containing N numbers, diveded roundly to n chunks
"""
arr = []
if N == 0:
return arr
elif n == 0:
arr.append(N)
return arr
r = N // n
for i in range(n-1):
arr.append(r)
arr.append(N-r*(n-1))
last_n = arr[-1]
# last number always will be r <= last_n < 2*r
# when last_n == r it's ok, but when last_n > r ...
if last_n > r:
# ... and if difference too big (bigger than 1), then
if abs(r-last_n) > 1:
#[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12
# we need to give unnecessary numbers to first elements back
diff = last_n - r
for k in range(diff):
arr[k] += 1
arr[-1] = r
# and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2]
return arr
def split_items(items, chunks):
arr = proportional_dividing(len(items), chunks)
splitted = []
for chunk_size in arr:
splitted.append(items[:chunk_size])
items = items[chunk_size:]
print(splitted)
return splitted
items = [1,2,3,4,5,6,7,8,9,10,11]
chunks = 3
split_items(items, chunks)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3)
split_items(range(100), 4)
split_items(range(99), 4)
split_items(range(101), 4)
并输出:
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']]
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']]
[range(0, 25), range(25, 50), range(50, 75), range(75, 100)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 99)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 101)]
答案 50 :(得分:1)
抽象应该是
l = [1,2,3,4,5,6,7,8,9]
n = 3
outList = []
for i in range(n, len(l) + n, n):
outList.append(l[i-n:i])
print(outList)
这将打印:
[[1、2、3],[4、5、6],[7、8、9]]
答案 51 :(得分:1)
这适用于v2 / v3,是可内联的,基于生成器的,只使用标准库:
import itertools
def split_groups(iter_in, group_size):
return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))
答案 52 :(得分:0)
from itertools import islice
l=[1,2,3,4,5,6]
chuncksize=input("Enter chunk size")
m=[]
obj=iter(l)
m.append(list(islice(l,3)))
m.append(list(islice(l,3)))
print(m)
答案 53 :(得分:0)
您可以使用Dask将列表分成大小均匀的块。 Dask的另一项好处是可以节省内存,这对于大型数据来说是最好的。为了获得最佳结果,如果列表很大,应将列表直接加载到dask数据框中以节省内存。根据您要对列表执行的操作,Dask提供了可以使用的完整API功能:http://docs.dask.org/en/latest/dataframe-api.html
import pandas as pd
import dask.dataframe as dd
split = 4
my_list = range(100)
df = dd.from_pandas(pd.DataFrame(my_list), npartitions = split)
my_list = [ df.get_partition(n).compute().iloc[:,0].tolist() for n in range(split) ]
# [[1,2,3,..],[26,27,28...],[51,52,53...],[76,77,78...]]
答案 54 :(得分:0)
import pprint pprint.pprint(list(chunks(range(10, 75), 10))) [range(10, 20), range(20, 30), range(30, 40), range(40, 50), range(50, 60), range(60, 70), range(70, 75)]
将此实现的结果与accepted answer的示例用法结果一起提供。
上面的许多函数都假设整个可迭代的长度是预先已知的,或者至少计算起来便宜。
对于某些流对象,这意味着首先将完整数据加载到内存中(例如下载整个文件)以获取长度信息。
但是,如果您还不知道完整大小,可以改用以下代码:
def chunks(iterable, size):
"""
Yield successive chunks from iterable, being `size` long.
https://stackoverflow.com/a/55776536/3423324
:param iterable: The object you want to split into pieces.
:param size: The size each of the resulting pieces should have.
"""
i = 0
while True:
sliced = iterable[i:i + size]
if len(sliced) == 0:
# to suppress stuff like `range(max, max)`.
break
# end if
yield sliced
if len(sliced) < size:
# our slice is not the full length, so we must have passed the end of the iterator
break
# end if
i += size # so we start the next chunk at the right place.
# end while
# end def
之所以有效,是因为如果您传递了可迭代的结尾,则slice命令将返回较少/无元素:
"abc"[0:2] == 'ab'
"abc"[2:4] == 'c'
"abc"[4:6] == ''
我们现在使用切片的结果,并计算生成的块的长度。如果它小于我们的预期,我们知道可以结束迭代。
那样,除非访问,否则迭代器将不会执行。
答案 55 :(得分:0)
如果您不关心订单:
> from itertools import groupby
> batch_no = 3
> data = 'abcdefgh'
> [
[x[1] for x in x[1]]
for x in
groupby(
sorted(
(x[0] % batch_no, x[1])
for x in
enumerate(data)
),
key=lambda x: x[0]
)
]
[['a', 'd', 'g'], ['b', 'e', 'h'], ['c', 'f']]
此解决方案不会生成相同大小的集合,而是会分配值,以使批次尽可能大,同时保持所生成批次的数量。
答案 56 :(得分:0)
python pydash
软件包可能是一个不错的选择。
from pydash.arrays import chunk
ids = ['22', '89', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '1']
chunk_ids = chunk(ids,5)
print(chunk_ids)
# output: [['22', '89', '2', '3', '4'], ['5', '6', '7', '8', '9'], ['10', '11', '1']]
答案 57 :(得分:0)
这个问题使我想起了Perl 6 .comb(n)
方法。它将字符串分成n
个大小的块。 (不止如此,但我将省略细节。)
很容易在Python3中实现类似的功能作为lambda表达式:
comb = lambda s,n: (s[i:i+n] for i in range(0,len(s),n))
然后您可以这样称呼它:
some_list = list(range(0, 20)) # creates a list of 20 elements
generator = comb(some_list, 4) # creates a generator that will generate lists of 4 elements
for sublist in generator:
print(sublist) # prints a sublist of four elements, as it's generated
当然,您不必将生成器分配给变量。您可以像这样直接将其循环:
for sublist in comb(some_list, 4):
print(sublist) # prints a sublist of four elements, as it's generated
作为奖励,此comb()
函数还可以对字符串进行操作:
list( comb('catdogant', 3) ) # returns ['cat', 'dog', 'ant']
答案 58 :(得分:0)
一种古老的方法,不需要itertools,但仍然可以与任意生成器一起使用:
def chunks(g, n):
"""divide a generator 'g' into small chunks
Yields:
a chunk that has 'n' or less items
"""
n = max(1, n)
buff = []
for item in g:
buff.append(item)
if len(buff) == n:
yield buff
buff = []
if buff:
yield buff
答案 59 :(得分:0)
OP 已请求“相等大小的块”。我将“同等大小”理解为“平衡”大小。 这意味着我们正在寻找大约相同尺寸的物品组;不一定相等。
这里的输入是:
input_list
(例如 23 个数字的列表)n_groups
(例如,5
)输入:
input_list = list(range(23))
n_groups = 5
approx_sizes = len(input_list)/n_groups
groups_cont = [input_list[int(i*approx_sizes):int((i+1)*approx_sizes)]
for i in range(n_groups)]
groups_leap = [input_list[i::n_groups]
for i in range(n_groups)]
print(len(input_list))
print('Contiguous elements lists:')
print(groups_cont)
print('Leap every "N" items lists:')
print(groups_leap)
<块引用>
将输出:
23
Contiguous elements lists:
[[0, 1, 2, 3], [4, 5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16, 17], [18, 19, 20, 21, 22]]
Leap every "N" items lists:
[[0, 5, 10, 15, 20], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18], [4, 9, 14, 19]]
答案 60 :(得分:0)
任何可迭代对象的通用分块器,使用户可以选择如何在最后处理部分分块。
在Python 3上测试。
chunker.py
from enum import Enum
class PartialChunkOptions(Enum):
INCLUDE = 0
EXCLUDE = 1
PAD = 2
ERROR = 3
class PartialChunkException(Exception):
pass
def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
"""
A chunker yielding n-element lists from an iterable, with various options
about what to do about a partial chunk at the end.
on_partial=PartialChunkOptions.INCLUDE (the default):
include the partial chunk as a short (<n) element list
on_partial=PartialChunkOptions.EXCLUDE
do not include the partial chunk
on_partial=PartialChunkOptions.PAD
pad to an n-element list
(also pass pad=<pad_value>, default None)
on_partial=PartialChunkOptions.ERROR
raise a RuntimeError if a partial chunk is encountered
"""
on_partial = PartialChunkOptions(on_partial)
iterator = iter(iterable)
while True:
vals = []
for i in range(n):
try:
vals.append(next(iterator))
except StopIteration:
if vals:
if on_partial == PartialChunkOptions.INCLUDE:
yield vals
elif on_partial == PartialChunkOptions.EXCLUDE:
pass
elif on_partial == PartialChunkOptions.PAD:
yield vals + [pad] * (n - len(vals))
elif on_partial == PartialChunkOptions.ERROR:
raise PartialChunkException
return
return
yield vals
test.py
import chunker
chunk_size = 3
for it in (range(100, 107),
range(100, 109)):
print("\nITERABLE TO CHUNK: {}".format(it))
print("CHUNK SIZE: {}".format(chunk_size))
for option in chunker.PartialChunkOptions.__members__.values():
print("\noption {} used".format(option))
try:
for chunk in chunker.chunker(it, chunk_size, on_partial=option):
print(chunk)
except chunker.PartialChunkException:
print("PartialChunkException was raised")
print("")
test.py
ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3
option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]
option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]
option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised
ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3
option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]
答案 61 :(得分:0)
虽然有很多答案我有很简单的方法:
x = list(range(10, 75))
indices = x[0::10]
print("indices: ", indices)
xx = [x[i-10:i] for i in indices ]
print("x= ", x)
print ("xx= ",xx)
结果将是:
<块引用>指数:[10, 20, 30, 40, 50, 60, 70] x= [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]
xx = [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25,26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
答案 62 :(得分:0)
我创建了这两个花哨的单线,它们高效而懒惰,输入和输出都是可迭代的,而且它们不依赖于任何模块:
第一个单行代码完全是惰性的,这意味着它将返回产生迭代器的迭代器(即,产生的每个块都是迭代器,在块的元素上进行迭代),此版本适用于块非常大或元素生成缓慢的情况并应在生产后立即可用:
chunk_iters = lambda it, n: ((e for i, g in enumerate(((f,), cit)) for j, e in zip(range((1, n - 1)[i]), g)) for cit in (iter(it),) for f in cit)
第二个单行返回返回列表的迭代器。只要通过输入迭代器可以使用整个块的元素,或者到达最后一个块的最后一个元素,就会生成每个列表。如果输入元素快速生成或所有立即可用,则应使用此版本。应该使用其他明智的优先更懒惰的单线版本。
chunk_lists = lambda it, n: (l for l in ([],) for i, g in enumerate((it, ((),))) for e in g for l in (l[:len(l) % n] + [e][:1 - i],) if (len(l) % n == 0) != i)
我还提供了第一个chunk_iters
单行代码的多行版本,该代码返回迭代器,从而生成另一个迭代器(遍历每个块的元素):
def chunk_iters(it, n):
cit = iter(it)
def one_chunk(f):
yield f
for i, e in zip(range(n - 1), cit):
yield e
for f in cit:
yield one_chunk(f)
答案 63 :(得分:-1)
使用python的列表理解
[range(t,t+10) for t in range(1,1000,10)]
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30],....
....[981, 982, 983, 984, 985, 986, 987, 988, 989, 990],
[991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
访问this link以了解列表理解
答案 64 :(得分:-1)
def main():
print(chunkify([1,2,3,4,5,6],2))
def chunkify(list, n):
chunks = []
for i in range(0, len(list), n):
chunks.append(list[i:i+n])
return chunks
main()
我认为这很简单,可以为您提供大量的数组。
答案 65 :(得分:-1)
这里有一些用Python3编写的代码,其功能与np.array_split相同。
list(map(list, map(functools.partial(filter, None), itertools.zip_longest(*iter(lambda: tuple(itertools.islice(a, n)), ())))))
这是一个很长的单行代码,但是它确实将项目平均分配到了产生的子列表中。
答案 66 :(得分:-2)
def chunked(iterable, size):
chunk = ()
for item in iterable:
chunk += (item,)
if len(chunk) % size == 0:
yield chunk
chunk = ()
if chunk:
yield chunk
答案 67 :(得分:-2)
是的,这是一个老问题,但我不得不发布这个问题,因为它甚至比类似的更短。 是的,结果看起来很乱,但如果它只是差不多......
>>> n = 3 # number of groups
>>> biglist = range(30)
>>>
>>> [ biglist[i::n] for i in xrange(n) ]
[[0, 3, 6, 9, 12, 15, 18, 21, 24, 27],
[1, 4, 7, 10, 13, 16, 19, 22, 25, 28],
[2, 5, 8, 11, 14, 17, 20, 23, 26, 29]]
答案 68 :(得分:-2)
def split(arr, size):
L = len(arr)
assert 0 < size <= L
s, r = divmod(L, size)
t = s + 1
a = ([arr[p:p+t] for p in range(0, r*t, t)] + [arr[p:p+s] for p in range(r*t, L, s)])
return a
受到http://wordaligned.org/articles/slicing-a-list-evenly-with-python
的启发答案 69 :(得分:-5)
没有人在itertools下使用tee()函数?
http://docs.python.org/2/library/itertools.html#itertools.tee
>>> import itertools
>>> itertools.tee([1,2,3,4,5,6],3)
(<itertools.tee object at 0x02932DF0>, <itertools.tee object at 0x02932EB8>, <itertools.tee object at 0x02932EE0>)
这会将列表拆分为3个迭代器,循环迭代器将获得长度相等的子列表