我正在使用一个函数进行超过100K的调用,使用2个函数,我用第一个函数联系到api,并为每个主机获取sysinfo(一个字典),然后使用第二个函数,我通过sysinfo并获取IP地址。我正在寻找一种加快速度的方法,但之前从未使用过多处理/线程(目前大约需要3个小时)。
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
#pool = ThreadPool(4)
p = Pool(5)
#obviously I removed a lot of the code that generates some of these
#variables, but this is the part that slooooows everything down.
def get_sys_info(self, host_id, appliance):
sysinfo = self.hx_request("https://{}:3000//hx/api/v3/hosts/{}/sysinfo"
return sysinfo
def get_ips_from_sysinfo(self, sysinfo):
sysinfo = sysinfo["data"]
network_array = sysinfo.get("networkArray", {})
network_info = network_array.get("networkInfo", [])
ips = []
for ni in network_info:
ip_array = ni.get("ipArray", {})
ip_info = ip_array.get("ipInfo", [])
for i in ip_info:
ips.append(i)
return ips
if __name__ == "__main__":
for i in ids:
sysinfo = rr.get_sys_info(i, appliance)
hostname = sysinfo.get("data", {}).get("hostname")
try:
ips = p.map(rr.get_ips_from_sysinfo(sysinfo))
except Exception as e:
rr.logger.error("Exception on {} -- {}".format(hostname, e))
continue
#Tried calling it here
ips = p.map(rr.get_ips_from_sysinfo(sysinfo))
我必须经历超过100,000个api调用,而这确实是使一切变慢的部分。
我想我已经尝试了一切,并得到了所有可能的可迭代的,缺少参数的错误。
我真的很感谢任何类型的帮助。谢谢!
答案 0 :(得分:2)
您可以使用线程和队列进行通信,首先您将启动get_ips_from_sysinfo
作为单个线程来监视和处理将存储在sysinfo
中的所有完成的output_list
,然后触发所有{ {1}}个线程,请注意不要耗尽10万个线程
get_sys_info
答案 1 :(得分:1)
正如@wwii所评论的那样,concurrent.futures
提供了一些您可能会帮助您的便利,尤其是因为这看起来像是批处理工作。
看来,性能下降最有可能来自网络调用,因此多线程可能更适合您的用例(here是多处理的比较)。如果没有,则可以在使用相同的API的同时将池从线程切换到进程。
from concurrent.futures import ThreadPoolExecutor, as_completed
# You can import ProcessPoolExecutor instead and use the same APIs
def thread_worker(instance, host_id, appliance):
"""Wrapper for your class's `get_sys_info` method"""
sysinfo = instance.get_sys_info(host_id, appliance)
return sysinfo, instance
# instantiate the class that contains the methods in your example code
# I will call it `RR`
instances = (RR(*your_args, **your_kwds) for your_args, your_kwds
in zip(iterable_of_args, iterable_of_kwds))
all_host_ids = another_iterable
all_appliances = still_another_iterable
if __name__ == "__main__":
with ThreadPoolExecutor(max_workers=50) as executor: # assuming 10 threads per core; your example uses 5 processes
pool = {executor.submit(thread_worker, instance, _id, _app): (_id, _app)
for _id, _app in zip(instances, all_host_ids, all_appliances)}
# handle the `sysinfo` dicts as they arrive
for future in as_completed(pool):
_result = future.result()
if isinstance(_sysinfo, Exception): # just one way of handling exceptions
# do something
print(f"{pool[future]} raised {future.result()}")
else:
# enqueue results for parallel processing in a separate stage, or
# process the results serially
_sysinfo, _instance = _result
ips = _instance.get_ips_from_sysinfo(_sysinfo)
# do something with `ips`
您可以通过将方法重构为函数来简化此示例,如果它们确实不像代码中那样使用状态的话。
如果提取sysinfo
数据很昂贵,则可以将结果放入队列,然后将结果馈送到ProcessPoolExecutor
上,该get_ips_from_sysinfo
对排队的字典调用import turtle
i = int(input(">>> "))
while True:
turtle.forward(i)
i = int(input(">>> "))
if i == 0:
break
。
答案 2 :(得分:1)
无论出于何种原因,我对在多个线程中调用实例方法都不太满意-但这似乎行得通。我使用concurrent.futures制作了这个玩具示例-希望它能很好地模仿您的实际情况。这会将4000个实例方法调用提交给(最多)500个工作人员的线程池。在使用max_workers
值的情况下,我发现执行时间的改进是线性的,最多约有1000名工人,然后改进 ratio 开始逐渐消失。
import concurrent.futures, time, random
a = [.001*n for n in range(1,4001)]
class F:
def __init__(self, name):
self.name = f'{name}:{self.__class__.__name__}'
def apicall(self,n):
wait = random.choice(a)
time.sleep(wait)
return (n,wait, self.name)
f = F('foo')
if __name__ == '__main__':
nworkers = 500
with concurrent.futures.ThreadPoolExecutor(nworkers) as executor:
# t = time.time()
futures = [executor.submit(f.apicall, n) for n in range(4000)]
results = [future.result() for future in concurrent.futures.as_completed(futures)]
# t = time.time() - t
# q = sum(r[1] for r in results)
# print(f'# workers:{nworkers} - ratio:{q/t}')
我没有考虑方法调用期间可能引发的异常,但是文档中的示例非常清楚如何处理该异常。
答案 3 :(得分:0)
所以...经过几天的研究后(非常感谢您!)和一些外部阅读(流利的Python Ch 17和有效的Python 59特定方式..)
function encrypt($data, $secret) {
$key = sha1(mb_convert_encoding($secret, "UTF-8"), true); // Create SHA-1 hash (20 byte)
$key = str_pad($key, 24, "\0"); // Extend to 24 byte by appending 0-values (would also happen automatically on openssl_encrypt-call)
$encrypted = openssl_encrypt($data, 'DES-EDE3', $key, OPENSSL_RAW_DATA); // Encryption: DESede (24 byte key), ECB-mode, PKCS5-Padding
return base64_encode($encrypted); // Base64-encoding
}
*修改后可以立即使用,希望对其他人有帮助