Question

我想减少使用多处理完成for循环所需的时间，但是我不确定如何明确地执行该操作，因为我还没有看到可以应用于此代码的模块的清晰基本使用模式

MyService

编辑：新代码，但仍然坏了...

some command

Answer 1

将代码放入函数中
分割索引
启动线程

from threading import Thread

THREADS = 10

allLines = fileRead.readlines()
allLines = [x.strip() for x in allLines]

def f(indexes, allLines):
    #This entire for loop needs to be parallelized
    for i in indexes:
        currentWord = allLines[currentLine]
        currentLine += 1
        currentURL = URL+currentWord
        uClient = uReq(currentURL)
        pageHTML = uClient.read()
        uClient.close()
        pageSoup = soup(pageHTML,'html.parser')
        pageHeader = str(pageSoup.h1)
        if 'Sorry!' in pageHeader:
            with open(fileA,'a') as fileAppend:
                fileAppend.write(currentWord + '\n')
            print(currentWord,'available')
        else:
            print(currentWord,'taken')

for i in range(THREADS):
  indexes = range(i*len(allLines), i*len(allLines)+THREADS, 1)
  Thread(target=f, args=(indexes, allLines)).start()

Answer 2

在不查看实际输入和输出的情况下，很难准确地知道可能在哪里发生问题。

您可以使用multiprocessing.dummy模块尝试一下，该模块只是Threading模块的包装器。

import multiprocessing.dummy

def parse_url(word):
    currentURL = URL+word
    uClient = uReq(currentURL)
    pageHTML = uClient.read()
    uClient.close()
    pageSoup = soup(pageHTML,'html.parser')
    pageHeader = str(pageSoup.h1)
    if 'Sorry!' in pageHeader:
        print(currentURL,'is available.')
        return word
    else:
        print(currentURL,'is taken.')
        return None

with open(fileR,'r') as fileRead:
    #This is just for printing two newlines? Could replace with a single print('\n')
    print('')
    print('')
    print(fileRead.name,fileRead.mode)
    with open(fileA,'w') as fileWrite:
        fileWrite.write('')
        print('')
        print('')
        print(fileWrite.name,'emptied.')
    allLines = fileRead.readlines()
    allLines = [x.strip() for x in allLines]

#Make a pool of 10 worker threads
with multiprocessing.dummy.Pool(10) as pool:
    result = pool.map_async(parse_url, allLines)
    #wait for all the URLs to be checked
    word_list = result.get()
    free_words = [x for x in word_list if x is not None]

with open(fileA,'w') as fileAppend:
    fileAppend.write('\n'.join(free_words))

我不确定如何将这个for循环与多处理模块并行化

2 个答案: