如何创建日志`csv`文件。其中包含与链接对应的文件名

时间:2017-05-29 12:39:30

标签: python python-2.7 csv python-requests logfile

我在csv文件中有sudo sh -c "apt-get update;apt-get dist-upgrade;apt-get autoremove;apt-get autoclean" 列表,其中包含urls这样的内容:

urls

运行下面提到的代码。下载的文件将另存为: manual_name1 12344.pdf #pdf link manual_name2 12334.pdf #pdf link manual_name1.pdf等等。

我想要一个manual_name2.pdf日志文件,其中包含与其下载位置对应的pdf文件的名称。在下面的例子中像这样

csv

以下是代码:

manual_name.pdf   12344.pdf #pdflink
manual_name2.pdf  12334.pdf #pdflink

1 个答案:

答案 0 :(得分:1)

如果您只需要与上面列出的格式相似的格式,则只需在下载PDF文件后立即将一行写入CSV。

import os
import csv
import requests
import time

write_path = '/Users/macossierra/Desktop/pdf'  # ASSUMING THAT FOLDER EXISTS!

with open('this.csv', 'r') as csvfile:
    with open('log.csv', 'wb') as csv_out:
        writer = csv.writer(csv_out)
        spamreader = csv.reader(csvfile)
        for link in spamreader:
            if not link:
                continue
            print('-'*72)
            pdf_file = '{}_{}.pdf'.format(link[0], int(time.time()))
            with open(os.path.join(write_path, pdf_file), 'wb') as pdf:
                try:
                    # Try to request PDF from URL
                    print('Trying to connect with link >>>>> {} ... '.format(link[1]))
                    a = requests.get(link[1], stream=True)
                    for block in a.iter_content(512):
                        if not block:
                            break
                        pdf.write(block)
                    print('File downloaded successfully.')
                    path = os.path.join(write_path, pdf_file)
                    writer.writerow([pdf_file, link[1], path, str(time.time())]) # writing content to our CSV log file
                except requests.exceptions.RequestException as e:  # This will catch ONLY Requests exceptions
                    print('REQUESTS ERROR:')
                    # This should tell you more details about the error log
                    print(e)