如何将单线程转换为多线程python脚本?

时间:2016-12-23 14:49:45

标签: python multithreading

我想将我的单线程脚本放入多线程脚本中,以便通过并行任务提高性能。 botleneck是请求注册商的延迟,我想发布超过1个请求以提高性能。

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})

for d in find_document:
    try:
        domaine = d['domain']
        print(domaine)
        w = whois.whois(domaine)
        date = w.expiration_date
        print date
        collection.update({"domain": domaine}, {"$set": {"expire": date}})
    except whois.parser.PywhoisError, err:
        print "AVAILABLE"
        collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})

最好的方法是什么?使用池与地图?另一种方式?

提前感谢您的回答。

1 个答案:

答案 0 :(得分:0)

如果您正在使用互联网,您可以从线程中看到真正的性能提升,而不会遇到多处理的麻烦,因为它能够同时等待多个请求。任何时候你都在进行并行执行,但是你可能会遇到打印到stdout或文件写入的潜在问题。这可以通过线程锁定轻松解决。

在你的情况下,我只想为每个d in find_document

创建一个单独的线程

每个线程需要几个参数,包括:

  • target = foo#线程在启动时将调用的函数
  • args =()#args foo将使用
  • 调用
  • kwargs = {}#你得到了图片

我还重新命令你的try-except来限制try块中的行数(良好做法)。要做到这一点,我添加了一个else块,这是一个非常好的事情,可以知道(也有for和while循环)。这也允许我将打印语句组合在一起,这样就可以将它们锁定,以防止单独的线程同时打印内容并导致无序输出。最后,我不知道你的收集对象是什么,如果它的更新方法是线程安全的,那么我也把它包装在一个锁中。

import threading

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})

def foo(d, printlock, updatelock):

    domaine = d['domain']
    try:
        w = whois.whois(domaine) #try to keep only what's necessary in try/except block
    except whois.parser.PywhoisError, err:
        with printlock:
            print(domaine)
            print("AVAILABLE")
        with updatelock
            collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})
    else:
        date = w.expiration_date
        with printlock:
            print(domaine) #move print statements together so lock doesn't block for long
            print(date)
        with updatelock
            collection.update({"domain": domaine}, {"$set": {"expire": date}})

updatelock = threading.Lock() #I'm not sure this function is thread safe, so we'll take the safe way out and lock it off
printlock = threading.Lock() #make sure only one thread prints at a time

threads = []
for d in find_document: #Create a list of threads and start them all
    t = threading.Thread(target=foo, args=(d,printlock,updatelock,))
    threads.append(t)
    t.start() #start each thread as we create it

for t in threads: #wait for all threads to complete
    t.join()

根据您的意见,您有太多的工作要尝试同时运行它们,因此我们需要的东西更像是多处理池,而不是我之前的例子。这样做的方法是设置一个给定数量的线程,这些线程循环遍历给定函数,消耗新参数,直到没有更多的参数消耗为止。为了保留我已编写的代码,我只是将其添加为一个也调用foo的新函数,但您可以将其全部写入单个函数中。

import threading

find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})

def foo(d, printlock, updatelock):

    domaine = d['domain']
    try:
        w = whois.whois(domaine) #try to keep only what's necessary in try/except block
    except whois.parser.PywhoisError, err:
        with printlock:
            print(domaine)
            print("AVAILABLE")
        with updatelock:
            collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})
    else:
        date = w.expiration_date
        with printlock:
            print(domaine) #move print statements together so lock doesn't block for long
            print(date)
        with updatelock:
            collection.update({"domain": domaine}, {"$set": {"expire": date}})

def consumer(producer):
    while True: 
        try:
            with iterlock: #no idea if find_document.iter is thread safe... assume not
                d = producer.next() #unrolling a for loop into a while loop
        except StopIteration:
            return #we're done
        else:
            foo(d, printlock, updatelock) #call our function from before

iterlock = threading.Lock() #lock to get next element from iterator
updatelock = threading.Lock() #I'm not sure this function is thread safe, so we'll take the safe way out and lock it off
printlock = threading.Lock() #make sure only one thread prints at a time

producer = iter(find_document) #create an iterator from find_document (expanded syntax of for _ in _ with function calls)

threads = []
for _ in range(16): #Create a list of 16 threads and start them all
    t = threading.Thread(target=consumer, args=(producer,))
    threads.append(t)
    t.start() #start each thread as we create it

for t in threads: #wait for all threads to complete
    t.join()