我有一个功能,可从网站文件中读取并请求登录页面。然后,我处理着陆页以获取信息,但是我进行了许多检查,并且使其线性运行需要很长时间。以下是我目前拥有的示例。
def main()
err_file = open("error.txt", "w+")
res_file = open("results.txt", "w+")
with open("a_file.txt", "r") as sites:
for url in sites:
try:
info = requests(url)
except Exception as exc:
err.file.write(exc)
else:
basic = get_basic(info)
res_file.write(check_more(info, basic)) # How can I make this faster?
def check_more(info, basic):
'''
Many regexes are done and results are written to a string variable
'''
final_result = basic + result_of_searching
return final_result
check_more()函数需要花费很长时间来处理,并且随着网页大小的增加,处理时间也会增加。当我继续处理更多网址并将check_more()的结果写入文件时,如何在后台运行check_more()?
我尝试了多处理队列和current.futures,但是我没有运气使它能够与当前的结构一起工作。
我在VM上运行此命令,因此套接字不是线程安全的,这就是为什么我要单独执行请求。 get_basic()进行套接字调用以获取网址的ip。