使用python请求模块下载csv中列出的1000个PDF链接。
答案 0 :(得分:3)
我建议您使用Requests,然后您可以按照以下方式执行操作:
import os
import csv
import requests
write_path = 'folder_name' # ASSUMING THAT FOLDER EXISTS!
with open('x.csv', 'r') as csvfile:
spamreader = csv.reader(csvfile)
for link in spamreader:
print('-'*72)
pdf_file = link[0].split('/')[-1]
with open(os.path.join(write_path, pdf_file), 'wb') as pdf:
try:
# Try to request PDF from URL
print('TRYING {}...'.format(link[0]))
a = requests.get(link[0], stream=True)
for block in a.iter_content(512):
if not block:
break
pdf.write(block)
print('OK.')
except requests.exceptions.RequestException as e: # This will catch ONLY Requests exceptions
print('REQUESTS ERROR:')
print(e) # This should tell you more details about the error
测试x.csv
的内容是:
https://www.pabanker.com/media/3228/qtr1pabanker_final-web.pdf
http://www.pdf995.com/samples/pdf.pdf
https://tcd.blackboard.com/webapps/dur-browserCheck-BBLEARN/samples/sample.pdf
http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf
示例输出:
$ python test.py
------------------------------------------------------------------------
TRYING https://www.pabanker.com/media/3228/qtr1pabanker_final-web.pdf...
REQUESTS ERROR:
("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))
------------------------------------------------------------------------
TRYING http://www.pdf995.com/samples/pdf.pdf...
OK.
------------------------------------------------------------------------
TRYING https://tcd.blackboard.com/webapps/dur-browserCheck-BBLEARN/samples/sample.pdf...
OK.
------------------------------------------------------------------------
TRYING http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf...
OK.