我不确定如何将这个for循环与多处理模块并行化

时间:2019-06-24 09:55:17

标签: python python-multiprocessing

我想减少使用多处理完成for循环所需的时间,但是我不确定如何明确地执行该操作,因为我还没有看到可以应用于此代码的模块的清晰基本使用模式

MyService

编辑:新代码,但仍然坏了...

some command

2 个答案:

答案 0 :(得分:0)

  • 将代码放入函数中
  • 分割索引
  • 启动线程
from threading import Thread

THREADS = 10

allLines = fileRead.readlines()
allLines = [x.strip() for x in allLines]

def f(indexes, allLines):
    #This entire for loop needs to be parallelized
    for i in indexes:
        currentWord = allLines[currentLine]
        currentLine += 1
        currentURL = URL+currentWord
        uClient = uReq(currentURL)
        pageHTML = uClient.read()
        uClient.close()
        pageSoup = soup(pageHTML,'html.parser')
        pageHeader = str(pageSoup.h1)
        if 'Sorry!' in pageHeader:
            with open(fileA,'a') as fileAppend:
                fileAppend.write(currentWord + '\n')
            print(currentWord,'available')
        else:
            print(currentWord,'taken')

for i in range(THREADS):
  indexes = range(i*len(allLines), i*len(allLines)+THREADS, 1)
  Thread(target=f, args=(indexes, allLines)).start()

答案 1 :(得分:0)

在不查看实际输入和输出的情况下,很难准确地知道可能在哪里发生问题。

您可以使用multiprocessing.dummy模块尝试一下,该模块只是Threading模块的包装器。

import multiprocessing.dummy

def parse_url(word):
    currentURL = URL+word
    uClient = uReq(currentURL)
    pageHTML = uClient.read()
    uClient.close()
    pageSoup = soup(pageHTML,'html.parser')
    pageHeader = str(pageSoup.h1)
    if 'Sorry!' in pageHeader:
        print(currentURL,'is available.')
        return word
    else:
        print(currentURL,'is taken.')
        return None

with open(fileR,'r') as fileRead:
    #This is just for printing two newlines? Could replace with a single print('\n')
    print('')
    print('')
    print(fileRead.name,fileRead.mode)
    with open(fileA,'w') as fileWrite:
        fileWrite.write('')
        print('')
        print('')
        print(fileWrite.name,'emptied.')
    allLines = fileRead.readlines()
    allLines = [x.strip() for x in allLines]

#Make a pool of 10 worker threads
with multiprocessing.dummy.Pool(10) as pool:
    result = pool.map_async(parse_url, allLines)
    #wait for all the URLs to be checked
    word_list = result.get()
    free_words = [x for x in word_list if x is not None]

with open(fileA,'w') as fileAppend:
    fileAppend.write('\n'.join(free_words))