我有很多数字。我想将大数字列表拆分为x个列表并并行处理。
这是我到目前为止的代码:
from multiprocessing import Pool
import numpy
def processNumList(numList):
for num in numList:
outputList.append(num ** 2)
numThreads = 5
bigNumList = list(range(50))
splitNumLists = numpy.array_split(bigNumList, numThreads)
outputList = []
for numList in splitNumLists:
processNumList(numList)
print(outputList)
上面的代码执行以下操作:
一切正常,但是一次只能处理一个列表。我希望同时处理每个列表。
执行此操作的正确代码是什么?我尝试了pool
,但似乎永远无法正常工作。
答案 0 :(得分:0)
您可以尝试这样的事情:
import threading
class MyClass(threading.Thread):
def __init__(self):
# init stuff
def run(self, arg, arg2):
# your logic to process the list
# split the list as you already did
for _ in range(numThreads):
MyThread(arg, arg2).start()
答案 1 :(得分:0)
这是我最终使用的代码。
我使用threading.Thread()
异步处理列表,然后调用thread.join()
以确保所有线程在继续操作之前都已完成。
我添加了time.sleep
来进行演示(以模拟冗长的任务),但显然您不想在生产代码中使用它。
import numpy
import threading
import time
def process_num_list(numList):
for num in numList:
output_list.append(num ** 2)
time.sleep(1)
num_threads = 5
big_num_list = list(range(30))
split_num_lists = numpy.array_split(big_num_list, num_threads)
output_list = []
threads = []
for num_list in split_num_lists:
thread = threading.Thread(target=process_num_list, args=[num_list])
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print(output_list)
作为奖励,这是五个硒窗口的工作示例:
from selenium import webdriver
import numpy
import threading
import time
def scrapeSites(siteList):
print("Preparing to scrape " + str(len(siteList)) + " sites")
driver = webdriver.Chrome(executable_path = r"..\chromedriver.exe")
driver.set_window_size(700, 400)
for site in siteList:
print("\nNow scraping " + site)
driver.get(site)
pageTitles.append(driver.title)
driver.quit()
numThreads = 5
fullWebsiteList = ["https://en.wikipedia.org/wiki/Special:Random"] * 30
splitWebsiteLists = numpy.array_split(fullWebsiteList, numThreads)
pageTitles = []
threads = []
for websiteList in splitWebsiteLists:
thread = threading.Thread(target=scrapeSites, args=[websiteList])
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print(pageTitles)