我想减少使用多处理完成for循环所需的时间,但是我不确定如何明确地执行该操作,因为我还没有看到可以应用于此代码的模块的清晰基本使用模式
MyService
编辑:新代码,但仍然坏了...
some command
答案 0 :(得分:0)
from threading import Thread
THREADS = 10
allLines = fileRead.readlines()
allLines = [x.strip() for x in allLines]
def f(indexes, allLines):
#This entire for loop needs to be parallelized
for i in indexes:
currentWord = allLines[currentLine]
currentLine += 1
currentURL = URL+currentWord
uClient = uReq(currentURL)
pageHTML = uClient.read()
uClient.close()
pageSoup = soup(pageHTML,'html.parser')
pageHeader = str(pageSoup.h1)
if 'Sorry!' in pageHeader:
with open(fileA,'a') as fileAppend:
fileAppend.write(currentWord + '\n')
print(currentWord,'available')
else:
print(currentWord,'taken')
for i in range(THREADS):
indexes = range(i*len(allLines), i*len(allLines)+THREADS, 1)
Thread(target=f, args=(indexes, allLines)).start()
答案 1 :(得分:0)
在不查看实际输入和输出的情况下,很难准确地知道可能在哪里发生问题。
您可以使用multiprocessing.dummy
模块尝试一下,该模块只是Threading
模块的包装器。
import multiprocessing.dummy
def parse_url(word):
currentURL = URL+word
uClient = uReq(currentURL)
pageHTML = uClient.read()
uClient.close()
pageSoup = soup(pageHTML,'html.parser')
pageHeader = str(pageSoup.h1)
if 'Sorry!' in pageHeader:
print(currentURL,'is available.')
return word
else:
print(currentURL,'is taken.')
return None
with open(fileR,'r') as fileRead:
#This is just for printing two newlines? Could replace with a single print('\n')
print('')
print('')
print(fileRead.name,fileRead.mode)
with open(fileA,'w') as fileWrite:
fileWrite.write('')
print('')
print('')
print(fileWrite.name,'emptied.')
allLines = fileRead.readlines()
allLines = [x.strip() for x in allLines]
#Make a pool of 10 worker threads
with multiprocessing.dummy.Pool(10) as pool:
result = pool.map_async(parse_url, allLines)
#wait for all the URLs to be checked
word_list = result.get()
free_words = [x for x in word_list if x is not None]
with open(fileA,'w') as fileAppend:
fileAppend.write('\n'.join(free_words))