AttributeError:XX实例没有属性'intitle'

时间:2012-12-03 09:09:40

标签: python html-parsing

代码来自: Python if-statement based on content of HTML title tag

from HTMLParser import HTMLParser

def titleFinder(html):
    class MyHTMLParser(HTMLParser):
        def handle_starttag(self, tag, attrs):
            self.intitle = tag == "title"
        def handle_data(self, data):
            if self.intitle:
                self.title = data

    parser = MyHTMLParser()
    parser.feed(html)
    return parser.title

>>> print titleFinder('<html><head><title>Test</title></head>'
                '<body><h1>Parse me!</h1></body></html>')
Test

但是,当运行以下代码时,我收到以下错误消息,

AttributeError:MyHTMLParser实例没有属性'intitle'

如何修复错误消息?有什么想法吗?

代码:

from HTMLParser import HTMLParser
import urllib2

def titleFinder(html):
    intitle = False
    class MyHTMLParser(HTMLParser):
        def handle_starttag(self, tag, attrs):
            self.intitle = tag == "title"
        def handle_data(self, data):
            if self.intitle:
                self.title = data

    parser = MyHTMLParser()
    parser.feed(html)
    return parser.title

response=urllib2.urlopen("https://stackoverflow.com/questions/13680074/attributeerror-xx-instance-has-no-attribute-intitle")
html= response.read()
print titleFinder(html)

引用是:

Traceback (most recent call last):
  File "D:\labs\test.py", line 19, in <module>
    print titleFinder(html)
  File "D:\labs\test.py", line 14, in titleFinder
    parser.feed(html)
  File "C:\Python27\lib\HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "C:\Python27\lib\HTMLParser.py", line 142, in goahead
    if i < j: self.handle_data(rawdata[i:j])
  File "D:\labs\test.py", line 10, in handle_data
    if self.intitle:
AttributeError: MyHTMLParser instance has no attribute 'intitle'

[UPDATE]

我终于解决了这个问题!谢谢你,Martijn Pieters!

from HTMLParser import HTMLParser
import urllib2

def titleFinder(html):
    class MyHTMLParser(HTMLParser):
        def __init__(self):
            HTMLParser.__init__(self)
            self.title = ''
            self.intitle = False  #!!!
        def handle_starttag(self, tag, attrs):
            self.intitle = tag == "title"
        def handle_data(self, data):
            if self.intitle:
                self.title = self.title+data #!!!

    parser = MyHTMLParser()
    parser.feed(html)
    return parser.title

response=urllib2.urlopen("https://stackoverflow.com/questions/13680074/attributeerror-xx-instance-has-no-attribute-intitle")

html= response.read()
print titleFinder(html)

1 个答案:

答案 0 :(得分:1)

在调用handle_data之前调用handle_starttag方法,此时没有设置intitle属性。

只需将intitle = False添加到您的班级:

class MyHTMLParser(HTMLParser):
    intitle = False

    # your methods

handle_data为文档中的所有文本节点调用,包括空格,因此在handle_starttag之前调用它并不常见。