Question

我正在使用PyQt4和BeautifulSoup做一些小脚本。基本上你指定url而不是脚本应该从web-page下载所有pic。

在输出中，当我提供http://yahoo.com时，它会下载除一个以外的所有图片：

...
Download Complete
Download Complete
File name is wrong 
Traceback (most recent call last):
  File "./picture_downloader.py", line 41, in loadComplete
    self.download_image()
  File "./picture_downloader.py", line 58, in download_image
    print 'File name is wrong ',image['src']
  File "/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.1.3-py2.7.egg/bs4/element.py", line 879, in __getitem__
    return self.attrs[key]
KeyError: 'src'

来自http://stackoverflow.com的

输出是：

Download Complete
File name is wrong  h
Download Complete

最后，这是代码的一部分：

# SLOT for loadFinished
def loadComplete(self): 
    self.download_image()

def download_image(self):
    html = unicode(self.frame.toHtml()).encode('utf-8')
    soup = bs(html)

    for image in soup.findAll('img'):
        try:
            file_name = image['src'].split('/')[-1]
            cur_path = os.path.abspath(os.curdir)
            if not os.path.exists(os.path.join(cur_path, 'images/')):
                os.makedirs(os.path.join(cur_path, 'images/'))
            f_path = os.path.join(cur_path, 'images/%s' % file_name)
            urlretrieve(image['src'], f_path)
            print "Download Complete"
        except:
            print 'File name is wrong ',image['src']
    print "No more pictures on the page"

Answer 1

好的，这就是正在发生的事情。在您的try-except中，您从KeyError获得file_name = image['src'].split('/')[-1]，因为该对象没有src属性。

然后，在您的except语句之后，您尝试访问导致错误的相同属性：print 'File name is wrong ',image['src']。

检查导致错误的img标记，并针对这些情况重新评估您的逻辑。

Answer 2

这意味着image元素没有"src"属性，并且您得到两次相同的错误：一次在file_name = image['src'].split('/')[-1]中，之后在except块{{1 }}

避免此问题的最简单方法是将'File name is wrong ',image['src']替换为soup.findAll('img')，以便只找到具有soup.findAll('img',{"src":True})属性的元素。

如果有两种可能性，请尝试以下方法：

src

美丽汤中的这个错误是什么意思？

2 个答案: