我在csv文件中有sudo sh -c "apt-get update;apt-get dist-upgrade;apt-get autoremove;apt-get autoclean"
列表,其中包含urls
这样的内容:
urls
运行下面提到的代码。下载的文件将另存为:
manual_name1 12344.pdf #pdf link
manual_name2 12334.pdf #pdf link
和manual_name1.pdf
等等。
我想要一个manual_name2.pdf
日志文件,其中包含与其下载位置对应的pdf文件的名称。在下面的例子中像这样
csv
以下是代码:
manual_name.pdf 12344.pdf #pdflink
manual_name2.pdf 12334.pdf #pdflink
答案 0 :(得分:1)
如果您只需要与上面列出的格式相似的格式,则只需在下载PDF文件后立即将一行写入CSV。
import os
import csv
import requests
import time
write_path = '/Users/macossierra/Desktop/pdf' # ASSUMING THAT FOLDER EXISTS!
with open('this.csv', 'r') as csvfile:
with open('log.csv', 'wb') as csv_out:
writer = csv.writer(csv_out)
spamreader = csv.reader(csvfile)
for link in spamreader:
if not link:
continue
print('-'*72)
pdf_file = '{}_{}.pdf'.format(link[0], int(time.time()))
with open(os.path.join(write_path, pdf_file), 'wb') as pdf:
try:
# Try to request PDF from URL
print('Trying to connect with link >>>>> {} ... '.format(link[1]))
a = requests.get(link[1], stream=True)
for block in a.iter_content(512):
if not block:
break
pdf.write(block)
print('File downloaded successfully.')
path = os.path.join(write_path, pdf_file)
writer.writerow([pdf_file, link[1], path, str(time.time())]) # writing content to our CSV log file
except requests.exceptions.RequestException as e: # This will catch ONLY Requests exceptions
print('REQUESTS ERROR:')
# This should tell you more details about the error log
print(e)