Question

我有一个网址列表，这些网址直接指向SEC提交的文件（例如https://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm）

我的目标专家写了一个for循环，用于打开URL，请求文档并将其保存到文件夹。但是，我需要稍后能够识别文档。这就是为什么我要使用“ htps：//www.sec.gov/Archives/edgar/data/ 18651/000119312509042636 /d10k.htm”作为文件名的原因，将其作为文件名

directory = r"\Desktop\10ks"
for url in url_list:
    response = requests.get(url).content
    path = (directory + str(url)[40:-5] +".txt")
    with open(path, "w") as f:
        f.write(response)
    f.close()

但是每次我都会收到以下错误消息：filenotfounderror：[errno 2]没有这样的文件或目录：

我真的希望你能帮助我！谢谢

Answer 1

这有效

for url in url_list:
    response = requests.get(url).content.decode('utf-8')
    path = (directory + str(url)[40:-5] +".txt").replace('/', '\\')
    with open(path, "w+") as f:
        f.write(response)
    f.close()

您构建的路径类似\\Desktop\\10ks18651/000119312509042636/d10.txt，我想您正在Windows上使用这些反斜杠，反正您只需要将URL中出现的反斜杠替换为反斜杠即可。

另一件事，write收到一个字符串，因为您需要对以字节为单位的响应进行解码。

希望对您有帮助！

Answer 2

import requests
import os
url_list = ["https://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm"]
#Create the path Desktop/10ks/
directory = os.path.expanduser("~/Desktop") + "\\10ks"
for url in url_list:
    #Get the content as string instead of getting it as bytes
    response = requests.get(url).text
    #Replace slash in filename with underscore
    filename = str(url)[40:-5].replace("/", "_")
    #print filename to check if it is correct
    print(filename)
    path = (directory + "\\" + filename +".txt")
    with open(path, "w") as f:
        f.write(response)
    f.close()

查看评论。我猜文件名中不允许反斜杠，因为

filename = str(url)[40:-5].replace("/", "\\")

给我

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\user/Desktop\\10ks\\18651\\000119312509042636\\d10.txt'

另请参阅：
https://docs.python.org/3/library/os.path.html#os.path.expanduser

Get request python as a string

https://docs.python.org/3/library/stdtypes.html#str.replace

Python在for循环中下载多个文件

2 个答案: