Question

我想将我的单线程脚本放入多线程脚本中，以便通过并行任务提高性能。 botleneck是请求注册商的延迟，我想发布超过1个请求以提高性能。

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})

for d in find_document:
    try:
        domaine = d['domain']
        print(domaine)
        w = whois.whois(domaine)
        date = w.expiration_date
        print date
        collection.update({"domain": domaine}, {"$set": {"expire": date}})
    except whois.parser.PywhoisError, err:
        print "AVAILABLE"
        collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})

最好的方法是什么？使用池与地图？另一种方式？

提前感谢您的回答。

Answer 1

如果您正在使用互联网，您可以从线程中看到真正的性能提升，而不会遇到多处理的麻烦，因为它能够同时等待多个请求。任何时候你都在进行并行执行，但是你可能会遇到打印到stdout或文件写入的潜在问题。这可以通过线程锁定轻松解决。

在你的情况下，我只想为每个d in find_document

创建一个单独的线程

每个线程需要几个参数，包括：

target = foo＃线程在启动时将调用的函数
args =（）＃args foo将使用
kwargs = {}＃你得到了图片

我还重新命令你的try-except来限制try块中的行数（良好做法）。要做到这一点，我添加了一个else块，这是一个非常好的事情，可以知道（也有for和while循环）。这也允许我将打印语句组合在一起，这样就可以将它们锁定，以防止单独的线程同时打印内容并导致无序输出。最后，我不知道你的收集对象是什么，如果它的更新方法是线程安全的，那么我也把它包装在一个锁中。

import threading

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})

def foo(d, printlock, updatelock):

    domaine = d['domain']
    try:
        w = whois.whois(domaine) #try to keep only what's necessary in try/except block
    except whois.parser.PywhoisError, err:
        with printlock:
            print(domaine)
            print("AVAILABLE")
        with updatelock
            collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})
    else:
        date = w.expiration_date
        with printlock:
            print(domaine) #move print statements together so lock doesn't block for long
            print(date)
        with updatelock
            collection.update({"domain": domaine}, {"$set": {"expire": date}})

updatelock = threading.Lock() #I'm not sure this function is thread safe, so we'll take the safe way out and lock it off
printlock = threading.Lock() #make sure only one thread prints at a time

threads = []
for d in find_document: #Create a list of threads and start them all
    t = threading.Thread(target=foo, args=(d,printlock,updatelock,))
    threads.append(t)
    t.start() #start each thread as we create it

for t in threads: #wait for all threads to complete
    t.join()

根据您的意见，您有太多的工作要尝试同时运行它们，因此我们需要的东西更像是多处理池，而不是我之前的例子。这样做的方法是设置一个给定数量的线程，这些线程循环遍历给定函数，消耗新参数，直到没有更多的参数消耗为止。为了保留我已编写的代码，我只是将其添加为一个也调用foo的新函数，但您可以将其全部写入单个函数中。

import threading

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})

def foo(d, printlock, updatelock):

    domaine = d['domain']
    try:
        w = whois.whois(domaine) #try to keep only what's necessary in try/except block
    except whois.parser.PywhoisError, err:
        with printlock:
            print(domaine)
            print("AVAILABLE")
        with updatelock:
            collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})
    else:
        date = w.expiration_date
        with printlock:
            print(domaine) #move print statements together so lock doesn't block for long
            print(date)
        with updatelock:
            collection.update({"domain": domaine}, {"$set": {"expire": date}})

def consumer(producer):
    while True: 
        try:
            with iterlock: #no idea if find_document.iter is thread safe... assume not
                d = producer.next() #unrolling a for loop into a while loop
        except StopIteration:
            return #we're done
        else:
            foo(d, printlock, updatelock) #call our function from before

iterlock = threading.Lock() #lock to get next element from iterator
updatelock = threading.Lock() #I'm not sure this function is thread safe, so we'll take the safe way out and lock it off
printlock = threading.Lock() #make sure only one thread prints at a time

producer = iter(find_document) #create an iterator from find_document (expanded syntax of for _ in _ with function calls)

threads = []
for _ in range(16): #Create a list of 16 threads and start them all
    t = threading.Thread(target=consumer, args=(producer,))
    threads.append(t)
    t.start() #start each thread as we create it

for t in threads: #wait for all threads to complete
    t.join()

如何将单线程转换为多线程python脚本？

1 个答案: