使用urllib.urlretrieve下载多个文件

时间:2015-07-25 13:21:10

标签: python-2.7 urllib python-2.x

我试图从网站下载多个文件。 网址类似于:foo.com/foo-1.pdf。 因为我希望这些文件存储在我选择的目录中, 我写了以下代码:

import os
from urllib import urlretrieve
ext = ".pdf"
for i in range(1,37):
    print "fetching file " + str(i)
    url = "http://foo.com/Lec-" + str(i) + ext
    myPath = "/dir/"
    filename = "Lec-"+str(i)+ext
    fullfilename = os.path.join(myPath, filename)
    x = urlretrieve(url, fullfilename)

编辑:完成错误消息。

Traceback (most recent call last):
File "scraper.py", line 10, in <module>
x = urlretrieve(url, fullfilename)
File "/usr/lib/python2.7/urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/usr/lib/python2.7/urllib.py", line 244, in retrieve
tfp = open(filename, 'wb')
IOError: [Errno 2] No such file or directory: /dir/Lec-1.pdf'

如果有人能指出我哪里出错了,我将不胜感激。

提前致谢!

1 个答案:

答案 0 :(得分:0)

对我来说,你的代码有效(Python3.9)。因此,请确保您的脚本可以访问您指定的目录。此外,您似乎正在尝试打开一个不存在的文件。因此,请确保在打开文件之前已下载该文件:

fullfilename = os.path.abspath("d:/DownloadedFiles/Lec-1.pdf")
print(fullfilename)
if os.path.exists(fullfilename): # open file only if it exists
    with open(fullfilename, 'rb') as file:
        content = file.read() # read file's content
        print(content[:150])  # print only the first 150 characters

输出如下:

C:/Users/Administrator/PycharmProjects/Tests/dtest.py
d:\DownloadedFiles\Lec-1.pdf
b'%PDF-1.6\r%\xe2\xe3\xcf\xd3\r\n2346 0 obj <</Linearized 1/L 1916277/O 2349/E 70472/N 160/T 1869308/H [ 536 3620]>>\rendobj\r       \r\nxref\r\n2346 12\r\n0000000016 00000 n\r'

Process finished with exit code 0

enter image description here