我正在尝试从提供的网站下载所有pdf,我使用以下代码:
import mechanize
from time import sleep
br = mechanize.Browser()
br.open('http://www.nerc.com/comm/CCC/Pages/AgendasHighlightsandMinutes-.aspx')
f=open("source.html","w")
f.write(br.response().read())
filetypes=[".pdf"]
myfiles=[]
for l in br.links():
for t in filetypes:
if t in str(l):
myfiles.append(l)
def downloadlink(l):
f=open(l.text,"w")
br.click_link(l)
f.write(br.response().read())
print l.text," has been downloaded"
for l in myfiles:
sleep(1)
downloadlink(l)
继续收到以下错误,无法找出原因。
legal and privacy has been downloaded
Traceback (most recent call last):
File "downloads-pdfs.py", line 29, in <module>
downloadlink(l)
File "downloads-pdfs.py", line 21, in downloadlink
f=open(l.text,"w")
IOError: [Errno 13] Permission denied: u'/trademark policy'
答案 0 :(得分:1)
您遇到的问题是因为您使用链接URL作为文件名。字符“/”在文件名中无效。尝试将downloadlink
函数修改为以下内容:
def downloadlink(l):
filename = l.text.split('/')[-1]
with open(filename, "w") as f:
br.click_link(l)
f.write(br.response().read())
print l.text," has been downloaded"