我有一个相对URL列表-PostLink
和一个base_url
。我通过for loop
向每个网址发出了请求。效果很好,大约需要六分钟。
import requests
baseurl = 'http://www.aaronsw.com/weblog/'
bowls = [requests.get(baseurl + i) for i in PostLink]
现在,我认为这项工作是I / O密集型工作,我希望通过多线程来加快爬网速度。
我尝试过
from concurrent.futures import ThreadPoolExecutor
pool = ThreadPoolExecutor(6)
res = []
for i in PostLink:
future = pool.submit(requests.get, (baseurl + i))
res.append(future.result())
我认为我做错了。任何帮助表示赞赏。
答案 0 :(得分:1)
下面是一些代码,可对项目列表进行多处理并针对列表中的每个项目并行执行your_function
from multiprocessing import Pool, cpu_count
def multi_processor(function_name):
file_list = []
# Test, put 6 strings in the list so your_function should run six times with 6 processors in parallel (assuming your CPU has that many cores)
file_list.append("test1")
file_list.append("test2")
file_list.append("test3")
file_list.append("test4")
file_list.append("test5")
file_list.append("test6")
# Use max number of system processors - 1
pool = Pool(processes=cpu_count()-1)
pool.daemon = True
results = {}
# for every file in the file list, start a new process
for each_file in file_list:
results[each_file] = pool.apply_async(function_name, args=("arg1", "arg2"))
# Wait for all processes to finish before proceeding
pool.close()
pool.join()
# Results and any errors are returned
return {your_function: result.get() for your_function, result in results.items()}
def your_function(arg1, arg2):
try:
print("put your stuff in this function")
your_results = ""
return your_results
except Exception as e:
return str(e)
if __name__ == "__main__":
some_results = multi_processor(your_function)
print(some_results)
答案 1 :(得分:0)
使用multiprocessing
大约需要54秒。
from multiprocessing import Pool
with Pool(6) as p:
bowls = p.map(requests.get, [baseurl+i for i in PostLink])