Question

程序检查url是否产生404，如果是，则将用户名写入文件。我试图添加多处理，以便程序运行得更快，因为有时候我会输入1000行的文本文件，这需要很长时间。但是，第一次运行此程序时（当输出文本文件为空时），它不会向输出文本文件写入任何内容。它只在第2，第3，第4 ......运行时开始写入输出文件。

#program checks twitch accounts in a file.
#writes accounts which aren't taken to another file.
import requests
from multiprocessing import Pool
x = "0"

accounts = open('accounts.txt', 'r')
valid_accounts = open('valid accounts.txt', 'a')

base_url = "https://www.twitch.tv/"

def check(x):
    for line in accounts:
        url = base_url + line
        twitch_r = requests.get(url)
        if twitch_r.status_code == 404:
            valid_accounts.write(line + "\n")



def Main():
    p = Pool(processes=25)
    p.imap(check, x)
    accounts.close()
    valid_accounts.close()



if __name__ == "__main__":
    Main()

Answer 1

您应该在var chkItemsValue = $scope.chkItems.reduce((count, item) => { return count + (item.value ? item.id : 0) }, 0)结束时致电p.close()然后p.join()。

Answer 2

您没有将帐户传递到池地图

p.imap(check, accounts)

Answer 3

您的主要问题是您使用的是imap而不是map。 imap是非阻塞的，这意味着您的主进程在进程运行之前退出。我有点惊讶它有时，因为我认为它应该有效永远不会。

尽管如此，您的计划存在一些问题：

检查方法，在不同的进程中运行，共享一个文件处理程序并遍历这些行。这只是偶然的工作（例如，它不适用于Windows）并且是不好的做法（把它放在中间）。您应首先阅读该文件，然后将这些行分发到流程
同样适用于写入文件。尽管在流程中附加到文件也是安全的，但更好的设计是将其放在父流程中
map和imap被认为在参数列表上运行，然后返回结果（映射值）
省略processes=20所以python可以根据您的计算机有多少核心找出最佳数量的进程

基于这些事情，这就是我建议的代码：

# program checks twitch accounts in a file.
# writes accounts which aren't taken to another file.
import requests
from multiprocessing import Pool, Queue

base_url = "https://www.twitch.tv/"

def check(line):
    twitch_r = requests.get(base_url + line)
    if twitch_r.status_code == 404:
        return line

def Main():
    queue_in = Queue()
    queue_out = Queue()
    p = Pool()

    with open('accounts.txt', 'r') as accounts:
        lines = accounts.readlines()

    results = p.map(check, lines)
    results = [r for r in results if r != None]
    with open('valid accounts.txt', 'a') as valid_accounts:
        for result in results:
            valid_accounts.write(result)

if __name__ == "__main__":
    Main()

唯一需要注意的是，您需要删除None中的results，因为check(line)会返回None所有不是404的网址{1}}。

<强>更新：

使用John的解决方案后，该程序按预期工作

我对此表示怀疑。由于你在Windows上，每个进程都有自己的文件处理程序指向accounts.txt并循环遍历所有行。所以你最终检查每个网址20次，多处理没有帮助你

我使用了imap，因为我读到imap没有返回列表（？）

没有。在这种情况下，map与imap的区别仅在于map等待所有进程完成（因此，您不需要调用join）。

有关map vs imap see here

的更全面的讨论

程序仅在第一个文件后运行时写入文件

3 个答案: