使用Python Mechanize下载映像

时间:2013-03-24 00:54:55

标签: python mechanize

我正在尝试编写一个Python脚本来下载图像并将其设置为我的壁纸。不幸的是,Mechanize文档很差。我的脚本正确地跟踪链接,但我很难将图像保存在我的计算机上。根据我的研究,.retrieve()方法应该完成工作,但是如何指定文件应该下载到的位置的路径?这就是我的......

def followLink(browser, fixedLink):
    browser.open(fixedLink)

if browser.find_link(url_regex = r'1600x1200'):

    browser.follow_link(url_regex = r'1600x1200')

elif browser.find_link(url_regex = r'1400x1050'):

    browser.follow_link(url_regex = r'1400x1050')

elif browser.find_link(url_regex = r'1280x960'):

    browser.follow_link(url_regex = r'1280x960')

 return

4 个答案:

答案 0 :(得分:9)

import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    data = browser.open(image['src']).read()
    browser.back()
    save = open(filename, 'wb')
    save.write(data)
    save.close()

这可以帮助您从网页下载所有图像。至于解析html,你最好使用BeautifulSoup或lxml。下载只是读取数据然后将其写入本地文件。您应该将自己的值分配给dir。这是你存在图像的地方。

答案 1 :(得分:5)

不确定为什么没有出现此解决方案,但您也可以使用mechanize.Browser.retrieve功能。也许这只适用于较新版本的mechanize,因此没有被提及?

无论如何,如果你想通过the answer缩短zhangyangyu,你可以这样做:

import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    browser.retrieve(image['src'], filename)
    browser.back()

另请注意,您可能希望将所有这些内容放入try except块中,如下所示:

import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    try:
        browser.retrieve(image['src'], filename)
        browser.back()
    except (mechanize.HTTPError,mechanize.URLError) as e:
        pass
        # Use e.code and e.read() with HTTPError
        # Use e.reason.args with URLError

当然,您需要根据自己的需要进行调整。如果遇到问题,也许你希望它能够爆炸。这完全取决于你想要达到的目标。

答案 2 :(得分:3)

您可以通过打开img src的网址来获取/下载图片。

image_response = browser.open_novisit(img['src'])

现在保存文件,只需使用fopen:

with open('image_out.png', 'wb') as f:
    f.write(image_response.read())

答案 3 :(得分:0)

它真的很糟糕但是它的作用很好。"对我来说,0xc0000022l anwer&#39>

import mechanize,os     来自BeautifulSoup进口BeautifulSoup     import urllib2

def DownloadIMGs(url): # IMPORTANT URL WITH HTTP OR HTTPS
    print "From", url
    dir = 'F:\Downloadss' #Dir for Downloads
    basicImgFileTypes = ['png','bmp','cur','ico','gif','jpg','jpeg','psd','raw','tif']

    browser = mechanize.Browser()
    html = browser.open(url)
    soup = BeautifulSoup(html)
    image_tags = soup.findAll('img')
    print "N Images:", len(image_tags)
    print
    #---------SAVE PATH
    #check if available
    if not os.path.exists(dir):
        os.makedirs(dir)
    #---------SAVE PATH
    for image in image_tags:

        #---------SAVE PATH + FILENAME (Where It is downloading)
        filename = image['src']
        fileExt = filename.split('.')[-1]
        fileExt = fileExt[0:3]

        if (fileExt in basicImgFileTypes):
            print 'File Extension:', fileExt
            filename = filename.replace('?', '_')
            filename = os.path.join(dir, filename.split('/')[-1])
            num = filename.find(fileExt) + len(fileExt)
            filename = filename[:num]
        else:
            filename = filename.replace('?', '_')
            filename = os.path.join(dir, filename.split('/')[-1]) + '.' + basicImgFileTypes[0]
        print 'File Saving:', filename
        #---------SAVE PATH + FILENAME (Where It is downloading)

        #--------- FULL URL PATH OF THE IMG
        imageUrl = image['src']
        print 'IMAGE SRC:', imageUrl

        if (imageUrl.find('http://') > -1 or imageUrl.find('https://') > -1):
            pass
        else:
            if (url.find('http://') > -1):
                imageUrl = url[:len('http://')]
                imageUrl = 'http://' + imageUrl.split('/')[0] + image['src']
            elif(url.find('https://') > -1):
                imageUrl = url[:len('https://')]
                imageUrl = 'https://' + imageUrl.split('/')[0] + image['src']
            else:
                imageUrl = image['src']

        print 'IMAGE URL:', imageUrl
        #--------- FULL URL PATH OF THE IMG

        #--------- TRY DOWNLOAD
        try:
            browser.retrieve(imageUrl, filename)
            print "Downloaded:", image['src'].split('/')[-1]
            print
        except (mechanize.HTTPError,mechanize.URLError) as e:
            print "Can't Download:", image['src'].split('/')[-1]
            print
            pass
        #--------- TRY DOWNLOAD
    browser.close()

DownloadIMGs('https://stackoverflow.com/questions/15593925/downloading-a-image-using-python-mechanize')