如何替换链接前的下载pdf文件的名称
我想将它保存为elkinson.pdf而不是Elkinson%20Jeffrey.pdf
CSV文件如下所示:
elkinson https://www.adndrc.org/diymodule/doc_panellist/Elkinson%20Jeffrey.pdf
papers_report http://www.parliament.bm/uploadedFiles/Content/House_Business/Presentation_of_Papers_and_of_Reports/PCA%20Report%209262014.pdf
代码:
import os
import csv
import requests
write_path = 'C:\\Users\\hgdht\\Desktop\\Downloader_Automation' # ASSUMING THAT FOLDER EXISTS!
with open('Links.csv', 'r') as csvfile:
spamreader = csv.reader(csvfile)
for link in spamreader:
if not link:
continue
print('-'*72)
pdf_file = link[0].split('/')[-1]
with open(os.path.join(write_path, pdf_file), 'wb') as pdf:
try:
# Try to request PDF from URL
print('TRYING {}...'.format(link[0]))
a = requests.get(link[0], stream=True)
for block in a.iter_content(512):
if not block:
break
pdf.write(block)
print('OK.')
except requests.exceptions.RequestException as e: # This
will catch ONLY Requests exceptions
print('REQUESTS ERROR:')
print(e) # This should tell you more details about the error
答案 0 :(得分:0)
在您的代码中,变量pdf_file包含文件名(Presentation_of_Papers_and_of_Reports / PCA%20Report%209262014.pdf),因此您可以使用python regex用空格替换该特殊字符串
pdf_file =re.sub(r'%[\d]+',' ',pdf_file).lower()
前:
import re
pdf_file = "Presentation_of_Papers_and_of_Reports/PCA%20Report%209262014.pdf"
pdf_file =re.sub(r'%[\d]+',' ',pdf_file).lower()
输出:'presentation_of_papers_and_of_reports / pca report .pdf'