Question

我想从具有50000个网址的txt文件中获取whois数据。它正在工作，但至少需要20分钟。我该怎么做才能改善此效果

import whois
from concurrent.futures import ThreadPoolExecutor
import threading
import time

pool = ThreadPoolExecutor(max_workers=2500)

def query(domain):
    while True:
        try:
            w = whois.whois(domain)
            fwrite = open("whoISSuccess.txt", "a")
            fwrite.write('\n{0} : {1}'.format(w.domain, w.expiration_date))
            fwrite.close()
        except:
            time.sleep(3)
            continue
        else:
            break


with open('urls.txt') as f:
    for line in f:
        lines = line.rstrip("\n\r")
        pool.submit(query, lines)


pool.shutdown(wait=True)

Answer 1

您可以做两件事来提高速度：

使用multiprocessing而不是threading，因为python线程实际上并不是并行运行的，而进程将由操作系统管理并真正并行运行。
第二，让每个进程写入其自己的文件，例如<url>.txt，因为所有进程都写入同一个文件会在文件写入时引起锁争用，这会显着降低程序速度。在完成所有过程之后，如果这是关键要求，则可以将所有文件聚合到一个文件中。或者，您可以只将whois结果保存在内存中，然后最后将其写到文件中。

如何在Python中加快多个http请求

1 个答案: