如何对嵌套列表中的n个分段进行分块

时间:2018-12-06 22:24:48

标签: python-3.x list nested-lists chunking

我正在尝试从嵌套列表中分出100个列表。我查看了有关Stack Overflow的多个示例,但仍然无法正常工作。

我的主列表名为data_to_insert,它包含其他列表。我想从主要的嵌套列表中提取(大块)100个列表。

我如何做到这一点?

这是我当前的代码,无法按需运行。

def divide_chunks(l, n):
   for i in range(0, len(l), n):
      yield l[i:i + n]

n = 100
x = list(divide_chunks(data_to_insert, 100)) 

嵌套列表示例:

data_to_insert = [['item1','item2','item3','item4','item5','item6'],
 ['item1','item2','item3','item4','item5','item6'],
 ['item1','item2','item3','item4','item5','item6'],
 ['item1','item2','item3','item4','item5','item6'],
 ['item1','item2','item3','item4','item5','item6'],
 ...
 [thousands of others lists go here]]

所需的输出是另一个列表(sliced_data),其中包含嵌套列表(data_to_insert)中的100个列表。

sliced_data = [['item1','item2','item3','item4','item5','item6'],
 ['item1','item2','item3','item4','item5','item6'], 
 ...
 [98 more lists go here]]

我需要遍历嵌套列表data_to_insert,直到它为空。

4 个答案:

答案 0 :(得分:1)

您可以使用random从给定列表中选择100个随机嵌套列表。

这将从原始列表中输出3个随机嵌套列表,

import random

l = [[1,2], [3,4], [1,1], [2,3], [3,5], [0,0]]
print(random.sample(l, 3))


# output,
[[3, 4], [1, 2], [2, 3]]

如果您不希望输出列表,则将print(random.sample(l, 3))替换为print(*random.sample(l, 3))

# output,
[1, 2] [2, 3] [1, 1]

如果您只想先创建100嵌套列表,然后这样做,

print(l[:100])

答案 1 :(得分:1)

如果我确实正确理解了您的问题,则需要首先将列表列表弄平,然后再创建一部分。这是一个使用chain.from_iterable中的itertools module以及创建块的代码的示例:

from itertools import chain

def chunks(elm, length):
    for k in range(0, len(elm), length):
        yield elm[k: k + length]


my_list = [['item{}'.format(j) for j in range(7)]] * 1000
flattened = list(chain.from_iterable(my_list))

chunks = list(chunks(flattened, 100))

print(len(chunks[10]))

输出:

100

答案 2 :(得分:0)

经过一些耗时的研究,我开发了一个可行的解决方案。下面的解决方案遍历列表列表并提取100个列表。

# Verifies that the list data_to_insert isn't empty
if len(data_to_insert) > 0:

  # Obtains the length of the data to insert.
  # The length is the number of sublists
  # contained in the main nestled list.
  data_length = len(data_to_insert)

  # A loop counter used in the
  # data insert process.
  i = 0

  # The number of sublists to slice
  # from the main nestled list in
  # each loop.
  n = 100

  # This loop execute a set of statements
  # as long as the condition below is true
  while i < data_length:

    # Increments the loop counter
    if len(data_to_insert) < 100:
      i += len(data_to_insert)
    else:
       i += 100

    # Slices 100 sublists from the main nestled list.
    sliced_data = data_to_insert[:n]

    # Verifies that the list sliced_data isn't empty
    if len(sliced_data) > 0:

      # Removes 1000 sublists from the main nestled list.
      data_to_insert = data_to_insert[n:]

      ##################################
      do something with the sliced_data
      ##################################

      # Clears the list used to store the
      # sliced_data in the insertion loop.
      sliced_data.clear()
      gc.collect()

   # Clears the list used to store the
   # data elements inserted into the
   # database.
   data_to_insert.clear()
   gc.collect()

答案 3 :(得分:0)

我基于Sufiyan Ghori使用random.的建议开发了第二种方法来实现我的目标

if len(my_nestled_list) > 0:

  # Obtains the length of the data to insert.
  # The length is the number of sublists
  # contained in the main nestled list.
  data_length = len(my_nestled_list))

  # A loop counter used in the
  # data insert process.
  i = 0

  # The number of sublists to slice
  # from the main nestled list in
  # each loop.
  n = 100

  # This loop execute a set of statements
  # as long as the condition below is true
  while i < data_length:

    # Increments the loop counter
    if len(my_nestled_list)) < 100:
      i += len(my_nestled_list))
    else:
      i += 100

    # Uses list comprehension to randomly select 100 lists 
    # from the nestled list.  
    random_sample_of_100 = [my_nestled_list)[i] for i in sorted(random.sample(range(len(my_nestled_list))), n))]

   print (random_sample_of_100)