我想将我的单线程脚本放入多线程脚本中,以便通过并行任务提高性能。 botleneck是请求注册商的延迟,我想发布超过1个请求以提高性能。
find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})
for d in find_document:
try:
domaine = d['domain']
print(domaine)
w = whois.whois(domaine)
date = w.expiration_date
print date
collection.update({"domain": domaine}, {"$set": {"expire": date}})
except whois.parser.PywhoisError, err:
print "AVAILABLE"
collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})
最好的方法是什么?使用池与地图?另一种方式?
提前感谢您的回答。
答案 0 :(得分:0)
如果您正在使用互联网,您可以从线程中看到真正的性能提升,而不会遇到多处理的麻烦,因为它能够同时等待多个请求。任何时候你都在进行并行执行,但是你可能会遇到打印到stdout或文件写入的潜在问题。这可以通过线程锁定轻松解决。
在你的情况下,我只想为每个d in find_document
每个线程需要几个参数,包括:
我还重新命令你的try-except
来限制try块中的行数(良好做法)。要做到这一点,我添加了一个else
块,这是一个非常好的事情,可以知道(也有for和while循环)。这也允许我将打印语句组合在一起,这样就可以将它们锁定,以防止单独的线程同时打印内容并导致无序输出。最后,我不知道你的收集对象是什么,如果它的更新方法是线程安全的,那么我也把它包装在一个锁中。
import threading
find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})
def foo(d, printlock, updatelock):
domaine = d['domain']
try:
w = whois.whois(domaine) #try to keep only what's necessary in try/except block
except whois.parser.PywhoisError, err:
with printlock:
print(domaine)
print("AVAILABLE")
with updatelock
collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})
else:
date = w.expiration_date
with printlock:
print(domaine) #move print statements together so lock doesn't block for long
print(date)
with updatelock
collection.update({"domain": domaine}, {"$set": {"expire": date}})
updatelock = threading.Lock() #I'm not sure this function is thread safe, so we'll take the safe way out and lock it off
printlock = threading.Lock() #make sure only one thread prints at a time
threads = []
for d in find_document: #Create a list of threads and start them all
t = threading.Thread(target=foo, args=(d,printlock,updatelock,))
threads.append(t)
t.start() #start each thread as we create it
for t in threads: #wait for all threads to complete
t.join()
根据您的意见,您有太多的工作要尝试同时运行它们,因此我们需要的东西更像是多处理池,而不是我之前的例子。这样做的方法是设置一个给定数量的线程,这些线程循环遍历给定函数,消耗新参数,直到没有更多的参数消耗为止。为了保留我已编写的代码,我只是将其添加为一个也调用foo
的新函数,但您可以将其全部写入单个函数中。
import threading
find_document = collection.find({"dns": "ERROR"}, {'domain': 1, '_id': 0})
def foo(d, printlock, updatelock):
domaine = d['domain']
try:
w = whois.whois(domaine) #try to keep only what's necessary in try/except block
except whois.parser.PywhoisError, err:
with printlock:
print(domaine)
print("AVAILABLE")
with updatelock:
collection.update({"domain": domaine}, {"$set": {"expire": "AVAILABLE"}})
else:
date = w.expiration_date
with printlock:
print(domaine) #move print statements together so lock doesn't block for long
print(date)
with updatelock:
collection.update({"domain": domaine}, {"$set": {"expire": date}})
def consumer(producer):
while True:
try:
with iterlock: #no idea if find_document.iter is thread safe... assume not
d = producer.next() #unrolling a for loop into a while loop
except StopIteration:
return #we're done
else:
foo(d, printlock, updatelock) #call our function from before
iterlock = threading.Lock() #lock to get next element from iterator
updatelock = threading.Lock() #I'm not sure this function is thread safe, so we'll take the safe way out and lock it off
printlock = threading.Lock() #make sure only one thread prints at a time
producer = iter(find_document) #create an iterator from find_document (expanded syntax of for _ in _ with function calls)
threads = []
for _ in range(16): #Create a list of 16 threads and start them all
t = threading.Thread(target=consumer, args=(producer,))
threads.append(t)
t.start() #start each thread as we create it
for t in threads: #wait for all threads to complete
t.join()