Question

我有很多数字。我想将大数字列表拆分为x个列表并并行处理。

这是我到目前为止的代码：

from multiprocessing import Pool
import numpy

def processNumList(numList):
    for num in numList:
        outputList.append(num ** 2)

numThreads = 5

bigNumList = list(range(50))

splitNumLists = numpy.array_split(bigNumList, numThreads)

outputList = []

for numList in splitNumLists:
    processNumList(numList)

print(outputList)

上面的代码执行以下操作：

将大数字列表拆分为指定数量的小列表
将每个列表传递给processNumList函数
随后打印结果列表

一切正常，但是一次只能处理一个列表。我希望同时处理每个列表。

执行此操作的正确代码是什么？我尝试了pool，但似乎永远无法正常工作。

Answer 1

您可以尝试这样的事情：

import threading

class MyClass(threading.Thread):
    def __init__(self):
        # init stuff

    def run(self, arg, arg2):
        # your logic to process the list

# split the list as you already did
for _ in range(numThreads):
    MyThread(arg, arg2).start()

Answer 2

这是我最终使用的代码。

我使用threading.Thread()异步处理列表，然后调用thread.join()以确保所有线程在继续操作之前都已完成。

我添加了time.sleep来进行演示（以模拟冗长的任务），但显然您不想在生产代码中使用它。

import numpy
import threading
import time

def process_num_list(numList):
    for num in numList:
        output_list.append(num ** 2)
        time.sleep(1)

num_threads = 5

big_num_list = list(range(30))

split_num_lists = numpy.array_split(big_num_list, num_threads)

output_list = []

threads = []

for num_list in split_num_lists:
    thread = threading.Thread(target=process_num_list, args=[num_list])
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print(output_list)

作为奖励，这是五个硒窗口的工作示例：

from selenium import webdriver
import numpy
import threading
import time

def scrapeSites(siteList):
    print("Preparing to scrape " + str(len(siteList)) + " sites")
    driver = webdriver.Chrome(executable_path = r"..\chromedriver.exe")
    driver.set_window_size(700, 400)
    for site in siteList:
        print("\nNow scraping " + site)
        driver.get(site)
        pageTitles.append(driver.title)
    driver.quit()

numThreads = 5

fullWebsiteList = ["https://en.wikipedia.org/wiki/Special:Random"] * 30

splitWebsiteLists = numpy.array_split(fullWebsiteList, numThreads)

pageTitles = []

threads = []

for websiteList in splitWebsiteLists:
    thread = threading.Thread(target=scrapeSites, args=[websiteList])
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print(pageTitles)

如何一次处理多个列表？

2 个答案: