Python在for循环中下载多个文件

时间:2020-05-26 09:49:14

标签: python download python-requests

我有一个网址列表,这些网址直接指向SEC提交的文件(例如https://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm

我的目标专家写了一个for循环,用于打开URL,请求文档并将其保存到文件夹。 但是,我需要稍后能够识别文档。这就是为什么我要使用“ htps://www.sec.gov/Archives/edgar/data/ 18651/000119312509042636 /d10k.htm”作为文件名的原因,将其作为文件名

directory = r"\Desktop\10ks"
for url in url_list:
    response = requests.get(url).content
    path = (directory + str(url)[40:-5] +".txt")
    with open(path, "w") as f:
        f.write(response)
    f.close()

但是每次我都会收到以下错误消息:filenotfounderror:[errno 2]没有这样的文件或目录:

我真的希望你能帮助我! 谢谢

2 个答案:

答案 0 :(得分:0)

这有效

for url in url_list:
    response = requests.get(url).content.decode('utf-8')
    path = (directory + str(url)[40:-5] +".txt").replace('/', '\\')
    with open(path, "w+") as f:
        f.write(response)
    f.close()

您构建的路径类似\\Desktop\\10ks18651/000119312509042636/d10.txt,我想您正在Windows上使用这些反斜杠,反正您只需要将URL中出现的反斜杠替换为反斜杠即可。

另一件事,write收到一个字符串,因为您需要对以字节为单位的响应进行解码。

希望对您有帮助!

答案 1 :(得分:0)

import requests
import os
url_list = ["https://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm"]
#Create the path Desktop/10ks/
directory = os.path.expanduser("~/Desktop") + "\\10ks"
for url in url_list:
    #Get the content as string instead of getting it as bytes
    response = requests.get(url).text
    #Replace slash in filename with underscore
    filename = str(url)[40:-5].replace("/", "_")
    #print filename to check if it is correct
    print(filename)
    path = (directory + "\\" + filename +".txt")
    with open(path, "w") as f:
        f.write(response)
    f.close()

查看评论。 我猜文件名中不允许反斜杠,因为

filename = str(url)[40:-5].replace("/", "\\")

给我

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\user/Desktop\\10ks\\18651\\000119312509042636\\d10.txt'

另请参阅:
https://docs.python.org/3/library/os.path.html#os.path.expanduser

Get request python as a string

https://docs.python.org/3/library/stdtypes.html#str.replace