将列表划分为大致相等部分的最佳方法是什么?例如,如果列表有7个元素并将其拆分为2个部分,我们希望在一个部分中获得3个元素,而另一个应该有4个元素。
我正在寻找像even_split(L, n)
那样将L
分成n
部分的内容。
def chunks(L, n):
""" Yield successive n-sized chunks from L.
"""
for i in xrange(0, len(L), n):
yield L[i:i+n]
上面的代码给出了3个块,而不是3个块。我可以简单地转置(迭代它并获取每列的第一个元素,调用第一部分,然后取第二部分并将其放入第二部分等),但这会破坏项目的顺序。
答案 0 :(得分:140)
您可以将其简单地编写为列表生成器:
def split(a, n):
k, m = divmod(len(a), n)
return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in xrange(n))
示例:
>>> list(split(range(11), 3))
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10]]
答案 1 :(得分:101)
只要你不想要任何像连续块一样愚蠢的东西:
>>> def chunkify(lst,n):
... return [lst[i::n] for i in xrange(n)]
...
>>> chunkify(range(13), 3)
[[0, 3, 6, 9, 12], [1, 4, 7, 10], [2, 5, 8, 11]]
答案 2 :(得分:83)
这是numpy.array_split
*的 raison d'être:
>>> L
[0, 1, 2, 3, 4, 5, 6, 7]
>>> print(*np.array_split(L, 3))
[0 1 2] [3 4 5] [6 7]
>>> print(*np.array_split(range(10), 4))
[0 1 2] [3 4 5] [6 7] [8 9]
*归功于第6会议室中的Zero Piraeus
答案 3 :(得分:59)
这是一个可行的方法:
def chunkIt(seq, num):
avg = len(seq) / float(num)
out = []
last = 0.0
while last < len(seq):
out.append(seq[int(last):int(last + avg)])
last += avg
return out
测试:
>>> chunkIt(range(10), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]
>>> chunkIt(range(11), 3)
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9, 10]]
>>> chunkIt(range(12), 3)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
答案 4 :(得分:15)
更改代码以生成n
块而不是n
的块:
def chunks(l, n):
""" Yield n successive chunks from l.
"""
newn = int(len(l) / n)
for i in xrange(0, n-1):
yield l[i*newn:i*newn+newn]
yield l[n*newn-newn:]
l = range(56)
three_chunks = chunks (l, 3)
print three_chunks.next()
print three_chunks.next()
print three_chunks.next()
给出:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]
这会将额外的元素分配给最终的组,这不是完美的,但在你的“大致N等分”的规范内:-)那么,我的意思是56个元素会更好(19,19,18)而这给出了(18,18,20)。
您可以使用以下代码获得更平衡的输出:
#!/usr/bin/python
def chunks(l, n):
""" Yield n successive chunks from l.
"""
newn = int(1.0 * len(l) / n + 0.5)
for i in xrange(0, n-1):
yield l[i*newn:i*newn+newn]
yield l[n*newn-newn:]
l = range(56)
three_chunks = chunks (l, 3)
print three_chunks.next()
print three_chunks.next()
print three_chunks.next()
输出:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
[19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]
[38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]
答案 5 :(得分:7)
如果将n
个元素划分为大约k
个块,则可以使n % k
个块比其他块大1个元素以分配额外的元素。
以下代码将为您提供块的长度:
[(n // k) + (1 if i < (n % k) else 0) for i in range(k)]
示例:n=11, k=3
会产生[4, 4, 3]
然后,您可以轻松计算块的起始位置:
[i * (n // k) + min(i, n % k) for i in range(k)]
示例:n=11, k=3
会产生[0, 4, 8]
使用i+1
块作为边界,我们得到i
列l
与n
列的l[i * (n // k) + min(i, n % k):(i+1) * (n // k) + min(i+1, n % k)]
[l[i * (n // k) + min(i, n % k):(i+1) * (n // k) + min(i+1, n % k)] for i in range(k)]
最后一步是使用list comprehension从所有块创建一个列表:
n=11, k=3, l=range(n)
示例:[range(0, 4), range(4, 8), range(8, 11)]
会产生df = df.set_index('mm_id', drop=True).transpose()
答案 6 :(得分:3)
这是一个添加None
以使列表长度相等的
>>> from itertools import izip_longest
>>> def chunks(l, n):
""" Yield n successive chunks from l. Pads extra spaces with None
"""
return list(zip(*izip_longest(*[iter(l)]*n)))
>>> l=range(54)
>>> chunks(l,3)
[(0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51), (1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52), (2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 53)]
>>> chunks(l,4)
[(0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52), (1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53), (2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, None), (3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, None)]
>>> chunks(l,5)
[(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50), (1, 6, 11, 16, 21, 26, 31, 36, 41, 46, 51), (2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52), (3, 8, 13, 18, 23, 28, 33, 38, 43, 48, 53), (4, 9, 14, 19, 24, 29, 34, 39, 44, 49, None)]
答案 7 :(得分:3)
n = 2
[list(x) for x in mit.divide(n, range(5, 11))]
# [[5, 6, 7], [8, 9, 10]]
[list(x) for x in mit.divide(n, range(5, 12))]
# [[5, 6, 7, 8], [9, 10, 11]]
答案 8 :(得分:2)
这将通过单个表达式进行拆分:
>>> myList = range(18)
>>> parts = 5
>>> [myList[(i*len(myList))//parts:((i+1)*len(myList))//parts] for i in range(parts)]
[[0, 1, 2], [3, 4, 5, 6], [7, 8, 9], [10, 11, 12, 13], [14, 15, 16, 17]]
此示例中的列表大小为18,分为5个部分。零件的大小不超过一个元素。
答案 9 :(得分:2)
说您想分为5部分:
p1, p2, p3, p4, p5 = np.split(df, 5)
答案 10 :(得分:2)
这是一个可以处理任何正(整数)个块的生成器。如果块的数量大于输入列表长度,则一些块将为空。该算法在短块和长块之间交替而不是隔离它们。
我还提供了一些用于测试ragged_chunks
函数的代码。
''' Split a list into "ragged" chunks
The size of each chunk is either the floor or ceiling of len(seq) / chunks
chunks can be > len(seq), in which case there will be empty chunks
Written by PM 2Ring 2017.03.30
'''
def ragged_chunks(seq, chunks):
size = len(seq)
start = 0
for i in range(1, chunks + 1):
stop = i * size // chunks
yield seq[start:stop]
start = stop
# test
def test_ragged_chunks(maxsize):
for size in range(0, maxsize):
seq = list(range(size))
for chunks in range(1, size + 1):
minwidth = size // chunks
#ceiling division
maxwidth = -(-size // chunks)
a = list(ragged_chunks(seq, chunks))
sizes = [len(u) for u in a]
deltas = all(minwidth <= u <= maxwidth for u in sizes)
assert all((sum(a, []) == seq, sum(sizes) == size, deltas))
return True
if test_ragged_chunks(100):
print('ok')
我们可以通过将乘法输出到range
调用来提高稍微的效率,但我认为以前的版本更具可读性(和DRYer)。
def ragged_chunks(seq, chunks):
size = len(seq)
start = 0
for i in range(size, size * chunks + 1, size):
stop = i // chunks
yield seq[start:stop]
start = stop
答案 11 :(得分:2)
使用numpy.linspace方法实现。
只需指定要将数组划分为的部分数。这些部分的大小几乎相等。
示例:
import numpy as np
a=np.arange(10)
print "Input array:",a
parts=3
i=np.linspace(np.min(a),np.max(a)+1,parts+1)
i=np.array(i,dtype='uint16') # Indices should be floats
split_arr=[]
for ind in range(i.size-1):
split_arr.append(a[i[ind]:i[ind+1]]
print "Array split in to %d parts : "%(parts),split_arr
给予:
Input array: [0 1 2 3 4 5 6 7 8 9]
Array split in to 3 parts : [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8, 9])]
答案 12 :(得分:2)
这是我的解决方案:
def chunks(l, amount):
if amount < 1:
raise ValueError('amount must be positive integer')
chunk_len = len(l) // amount
leap_parts = len(l) % amount
remainder = amount // 2 # make it symmetrical
i = 0
while i < len(l):
remainder += leap_parts
end_index = i + chunk_len
if remainder >= amount:
remainder -= amount
end_index += 1
yield l[i:end_index]
i = end_index
可生产
>>> list(chunks([1, 2, 3, 4, 5, 6, 7], 3))
[[1, 2], [3, 4, 5], [6, 7]]
答案 13 :(得分:2)
查看numpy.split:
>>> a = numpy.array([1,2,3,4])
>>> numpy.split(a, 2)
[array([1, 2]), array([3, 4])]
答案 14 :(得分:1)
使用列表理解:
def divide_list_to_chunks(list_, n):
return [list_[start::n] for start in range(n)]
答案 15 :(得分:1)
优雅:
[x.tolist() for x in np.array_split(range(10), 3)]
答案 16 :(得分:1)
1>
import numpy as np
data # your array
total_length = len(data)
separate = 10
sub_array_size = total_length // separate
safe_separate = sub_array_size * separate
splited_lists = np.split(np.array(data[:safe_separate]), separate)
splited_lists[separate - 1] = np.concatenate(splited_lists[separate - 1],
np.array(data[safe_separate:total_length]))
splited_lists # your output
2>
splited_lists = np.array_split(np.array(data), separate)
答案 17 :(得分:1)
假设您要将列表 [1、2、3、4、5、6、7、8] 分成3个元素列表
like [[1,2,3],[4、5、6],[7、8]] ,如果最后剩余的元素少于3个,则将它们分组在一起。
my_list = [1, 2, 3, 4, 5, 6, 7, 8]
my_list2 = [my_list[i:i+3] for i in range(0, len(my_list), 3)]
print(my_list2)
输出: [[1,2,3],[4、5、6],[7、8]]
其中一部分的长度为3。将3替换为您自己的块大小。
答案 18 :(得分:1)
我的解决方案,易于理解
def split_list(lst, n):
splitted = []
for i in reversed(range(1, n + 1)):
split_point = len(lst)//i
splitted.append(lst[:split_point])
lst = lst[split_point:]
return splitted
此页面上最短的单行(由我的女孩写)
def split(l, n):
return [l[int(i*len(l)/n):int((i+1)*len(l)/n-1)] for i in range(n)]
答案 19 :(得分:0)
另一种方式是这样的,这里的想法是使用石斑鱼,但摆脱None
。在这种情况下,我们将使用列表第一部分的元素和列表后面部分的“greater_parts”形成所有“small_parts”。 “较大部件”的长度为len(small_parts)+ 1.我们需要将x视为两个不同的子部件。
from itertools import izip_longest
import numpy as np
def grouper(n, iterable, fillvalue=None): # This is grouper from itertools
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
def another_chunk(x,num):
extra_ele = len(x)%num #gives number of parts that will have an extra element
small_part = int(np.floor(len(x)/num)) #gives number of elements in a small part
new_x = list(grouper(small_part,x[:small_part*(num-extra_ele)]))
new_x.extend(list(grouper(small_part+1,x[small_part*(num-extra_ele):])))
return new_x
我设置它的方式会返回一个元组列表:
>>> x = range(14)
>>> another_chunk(x,3)
[(0, 1, 2, 3), (4, 5, 6, 7, 8), (9, 10, 11, 12, 13)]
>>> another_chunk(x,4)
[(0, 1, 2), (3, 4, 5), (6, 7, 8, 9), (10, 11, 12, 13)]
>>> another_chunk(x,5)
[(0, 1), (2, 3, 4), (5, 6, 7), (8, 9, 10), (11, 12, 13)]
>>>
答案 20 :(得分:0)
#!/usr/bin/python
first_names = ['Steve', 'Jane', 'Sara', 'Mary','Jack','Bob', 'Bily', 'Boni', 'Chris','Sori', 'Will', 'Won','Li']
def chunks(l, n):
for i in range(0, len(l), n):
# Create an index range for l of n items:
yield l[i:i+n]
result = list(chunks(first_names, 5))
print result
从link中挑选出来,这对我有所帮助。我有一个预定义的列表。
答案 21 :(得分:0)
这是另一种变体,它将“剩余”元素均匀地分布在所有块中,一次一个,直到没有剩余。在此实现中,较大的块在进程开始时发生。
def chunks(l, k):
""" Yield k successive chunks from l."""
if k < 1:
yield []
raise StopIteration
n = len(l)
avg = n/k
remainders = n % k
start, end = 0, avg
while start < n:
if remainders > 0:
end = end + 1
remainders = remainders - 1
yield l[start:end]
start, end = end, end+avg
例如,从14个元素的列表中生成4个块:
>>> list(chunks(range(14), 4))
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10], [11, 12, 13]]
>>> map(len, list(chunks(range(14), 4)))
[4, 4, 3, 3]
答案 22 :(得分:0)
在这种情况下,我自己编写了代码:
def chunk_ports(port_start, port_end, portions):
if port_end < port_start:
return None
total = port_end - port_start + 1
fractions = int(math.floor(float(total) / portions))
results = []
# No enough to chuck.
if fractions < 1:
return None
# Reverse, so any additional items would be in the first range.
_e = port_end
for i in range(portions, 0, -1):
print "i", i
if i == 1:
_s = port_start
else:
_s = _e - fractions + 1
results.append((_s, _e))
_e = _s - 1
results.reverse()
return results
divide_ports(1、10、9)将返回
[(1, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10)]
答案 23 :(得分:0)
此代码对我有用(与Python3兼容):
def chunkify(tab, num):
return [tab[i*num: i*num+num] for i in range(len(tab)//num+(1 if len(tab)%num else 0))]
示例(适用于 bytearray 类型,但也适用于 list ):
b = bytearray(b'\x01\x02\x03\x04\x05\x06\x07\x08')
>>> chunkify(b,3)
[bytearray(b'\x01\x02\x03'), bytearray(b'\x04\x05\x06'), bytearray(b'\x07\x08')]
>>> chunkify(b,4)
[bytearray(b'\x01\x02\x03\x04'), bytearray(b'\x05\x06\x07\x08')]
答案 24 :(得分:0)
这提供了长度为<= n,> = 0的块
def
chunkify(lst, n):
num_chunks = int(math.ceil(len(lst) / float(n))) if n < len(lst) else 1
return [lst[n*i:n*(i+1)] for i in range(num_chunks)]
例如
>>> chunkify(range(11), 3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
>>> chunkify(range(11), 8)
[[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10]]
答案 25 :(得分:0)
我尝试了大部分解决方案,但是它们不适用于我的情况,因此我创建了一个适用于大多数情况和任何类型的数组的新函数:
import math
def chunkIt(seq, num):
seqLen = len(seq)
total_chunks = math.ceil(seqLen / num)
items_per_chunk = num
out = []
last = 0
while last < seqLen:
out.append(seq[last:(last + items_per_chunk)])
last += items_per_chunk
return out
答案 26 :(得分:0)
def evenly(l, n):
len_ = len(l)
split_size = len_ // n
split_size = n if not split_size else split_size
offsets = [i for i in range(0, len_, split_size)]
return [l[offset:offset + split_size] for offset in offsets]
示例:
l = [a for a in range(97)]
应该由10个部分组成,每个部分都具有9个元素,最后一个除外。
输出:
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44],
[45, 46, 47, 48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59, 60, 61, 62],
[63, 64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96]]
答案 27 :(得分:0)
如果您不介意更改顺序,建议您使用@job解决方案,否则,可以使用以下方法:
def chunkIt(seq, num):
steps = int(len(seq) / float(num))
out = []
last = 0.0
while last < len(seq):
if len(seq) - (last + steps) < steps:
until = len(seq)
steps = len(seq) - last
else:
until = int(last + steps)
out.append(seq[int(last): until])
last += steps
return out
答案 28 :(得分:0)
与job's回答相同,但会考虑大小小于chuncks数量的列表。
def chunkify(lst,n):
[ lst[i::n] for i in xrange(n if n < len(lst) else len(lst)) ]
如果n(块数)为7且lst(要划分的列表)为[1,2,3],则块[[0],[1],[2]]而不是[[0] ,[1],[2],[],[],[],[]]
答案 29 :(得分:0)
def chunk_array(array : List, n: int) -> List[List]:
chunk_size = len(array) // n
chunks = []
i = 0
while i < len(array):
# if less than chunk_size left add the remainder to last element
if len(array) - (i + chunk_size + 1) < 0:
chunks[-1].append(*array[i:i + chunk_size])
break
else:
chunks.append(array[i:i + chunk_size])
i += chunk_size
return chunks
这是我的版本(灵感来自Max's)
答案 30 :(得分:0)
你也可以使用:
split=lambda x,n: x if not x else [x[:n]]+[split([] if not -(len(x)-n) else x[-(len(x)-n):],n)][0]
split([1,2,3,4,5,6,7,8,9],2)
[[1, 2], [3, 4], [5, 6], [7, 8], [9]]
答案 31 :(得分:-1)
舍入linspace并将其用作索引比amit12690提出的解决方案更容易。
function chunks=chunkit(array,num)
index = round(linspace(0,size(array,2),num+1));
chunks = cell(1,num);
for x = 1:num
chunks{x} = array(:,index(x)+1:index(x+1));
end
end
答案 32 :(得分:-1)
另一种简单易读的词块分类器的尝试。
def chunk(iterable, count): # returns a *generator* that divides `iterable` into `count` of contiguous chunks of similar size
assert count >= 1
return (iterable[int(_*len(iterable)/count+0.5):int((_+1)*len(iterable)/count+0.5)] for _ in range(count))
print("Chunk count: ", len(list( chunk(range(105),10))))
print("Chunks: ", list( chunk(range(105),10)))
print("Chunks: ", list(map(list,chunk(range(105),10))))
print("Chunk lengths:", list(map(len, chunk(range(105),10))))
print("Testing...")
for iterable_length in range(100):
for chunk_count in range(1,100):
chunks = list(chunk(range(iterable_length),chunk_count))
assert chunk_count == len(chunks)
assert iterable_length == sum(map(len,chunks))
assert all(map(lambda _:abs(len(_)-iterable_length/chunk_count)<=1,chunks))
print("Okay")
输出:
Chunk count: 10
Chunks: [range(0, 11), range(11, 21), range(21, 32), range(32, 42), range(42, 53), range(53, 63), range(63, 74), range(74, 84), range(84, 95), range(95, 105)]
Chunks: [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20], [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], [42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52], [53, 54, 55, 56, 57, 58, 59, 60, 61, 62], [63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73], [74, 75, 76, 77, 78, 79, 80, 81, 82, 83], [84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94], [95, 96, 97, 98, 99, 100, 101, 102, 103, 104]]
Chunk lengths: [11, 10, 11, 10, 11, 10, 11, 10, 11, 10]
Testing...
Okay
答案 33 :(得分:-1)
n = len(lst)
# p is the number of parts to be divided
x = int(n/p)
i = 0
j = x
lstt = []
while (i< len(lst) or j <len(lst)):
lstt.append(lst[i:j])
i+=x
j+=x
print(lstt)
如果已知列表分成相等的部分,这是最简单的答案。