我有一个在单个线程中工作的函数。这是一个简化的例子。基本上我想验证一些死链接并将结果存储在字典列表中的每个项目中。
import requests
import sys
import logging
def main():
urls_to_check = [{'url': 'http://example.com'},
{'url': 'http://example.com'},
{'url': 'http://example.com'}]
print check_for_404(urls_to_check)
def check_for_404(urls_to_check):
for item in urls_to_check:
r = requests.get(item['url'])
item.update({'responseCode': r.status_code})
return urls_to_check
if __name__ == '__main__':
try:
main()
except:
logging.error("Unexpected error:" + str(sys.exc_info()))
输出:
[{'url': 'http://example.com', 'responseCode': 200}, {'url': 'http://example.com', 'responseCode': 200}, {'url': 'http://example.com', 'responseCode': 200}]
我对此很满意
现在,如果我实现多处理,我理解的是在多个核心之间拆分迭代并通过函数运行部分迭代...
import requests
import sys
import logging
from multiprocessing import Pool
def main():
urls_to_check = [{'url': 'http://example.com'},
{'url': 'http://example.com'},
{'url': 'http://example.com'}]
p = Pool(5)
print p.map(check_for_404, urls_to_check)
def check_for_404(urls_to_check):
for item in urls_to_check:
r = requests.get(item['url'])
item.update({'responseCode': r.status_code})
return urls_to_check
if __name__ == '__main__':
try:
main()
except:
logging.error("Unexpected error:" + str(sys.exc_info()))
我收到错误TypeError('string indices must be integers, not str',), <traceback object at 0x10ad6c128>)
如何实现多处理,以便我可以更快地处理一长串网址?
这是我正在看的教程: https://docs.python.org/2/library/multiprocessing.html
答案 0 :(得分:2)
您需要更改自己的&#34;检查404&#34;函数接受单个URL而不是列表; map函数一次传递一个列表元素(用于分隔池中的子进程),然后将它们重新组合回最后的列表中:
def check_for_404(item):
r = requests.get(item['url'])
item.update({'responseCode': r.status_code})
return item