我有一个网址列表,这些网址直接指向SEC提交的文件(例如https://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm)
我的目标专家写了一个for循环,用于打开URL,请求文档并将其保存到文件夹。 但是,我需要稍后能够识别文档。这就是为什么我要使用“ htps://www.sec.gov/Archives/edgar/data/ 18651/000119312509042636 /d10k.htm”作为文件名的原因,将其作为文件名
directory = r"\Desktop\10ks"
for url in url_list:
response = requests.get(url).content
path = (directory + str(url)[40:-5] +".txt")
with open(path, "w") as f:
f.write(response)
f.close()
但是每次我都会收到以下错误消息:filenotfounderror:[errno 2]没有这样的文件或目录:
我真的希望你能帮助我! 谢谢
答案 0 :(得分:0)
这有效
for url in url_list:
response = requests.get(url).content.decode('utf-8')
path = (directory + str(url)[40:-5] +".txt").replace('/', '\\')
with open(path, "w+") as f:
f.write(response)
f.close()
您构建的路径类似\\Desktop\\10ks18651/000119312509042636/d10.txt
,我想您正在Windows上使用这些反斜杠,反正您只需要将URL中出现的反斜杠替换为反斜杠即可。
另一件事,write
收到一个字符串,因为您需要对以字节为单位的响应进行解码。
希望对您有帮助!
答案 1 :(得分:0)
import requests
import os
url_list = ["https://www.sec.gov/Archives/edgar/data/18651/000119312509042636/d10k.htm"]
#Create the path Desktop/10ks/
directory = os.path.expanduser("~/Desktop") + "\\10ks"
for url in url_list:
#Get the content as string instead of getting it as bytes
response = requests.get(url).text
#Replace slash in filename with underscore
filename = str(url)[40:-5].replace("/", "_")
#print filename to check if it is correct
print(filename)
path = (directory + "\\" + filename +".txt")
with open(path, "w") as f:
f.write(response)
f.close()
查看评论。 我猜文件名中不允许反斜杠,因为
filename = str(url)[40:-5].replace("/", "\\")
给我
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\user/Desktop\\10ks\\18651\\000119312509042636\\d10.txt'
另请参阅:
https://docs.python.org/3/library/os.path.html#os.path.expanduser