代码来自: Python if-statement based on content of HTML title tag
from HTMLParser import HTMLParser
def titleFinder(html):
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
self.intitle = tag == "title"
def handle_data(self, data):
if self.intitle:
self.title = data
parser = MyHTMLParser()
parser.feed(html)
return parser.title
>>> print titleFinder('<html><head><title>Test</title></head>'
'<body><h1>Parse me!</h1></body></html>')
Test
但是,当运行以下代码时,我收到以下错误消息,
AttributeError:MyHTMLParser实例没有属性'intitle'
如何修复错误消息?有什么想法吗?
代码:
from HTMLParser import HTMLParser
import urllib2
def titleFinder(html):
intitle = False
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
self.intitle = tag == "title"
def handle_data(self, data):
if self.intitle:
self.title = data
parser = MyHTMLParser()
parser.feed(html)
return parser.title
response=urllib2.urlopen("https://stackoverflow.com/questions/13680074/attributeerror-xx-instance-has-no-attribute-intitle")
html= response.read()
print titleFinder(html)
引用是:
Traceback (most recent call last):
File "D:\labs\test.py", line 19, in <module>
print titleFinder(html)
File "D:\labs\test.py", line 14, in titleFinder
parser.feed(html)
File "C:\Python27\lib\HTMLParser.py", line 108, in feed
self.goahead(0)
File "C:\Python27\lib\HTMLParser.py", line 142, in goahead
if i < j: self.handle_data(rawdata[i:j])
File "D:\labs\test.py", line 10, in handle_data
if self.intitle:
AttributeError: MyHTMLParser instance has no attribute 'intitle'
[UPDATE]
我终于解决了这个问题!谢谢你,Martijn Pieters!
from HTMLParser import HTMLParser
import urllib2
def titleFinder(html):
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.title = ''
self.intitle = False #!!!
def handle_starttag(self, tag, attrs):
self.intitle = tag == "title"
def handle_data(self, data):
if self.intitle:
self.title = self.title+data #!!!
parser = MyHTMLParser()
parser.feed(html)
return parser.title
response=urllib2.urlopen("https://stackoverflow.com/questions/13680074/attributeerror-xx-instance-has-no-attribute-intitle")
html= response.read()
print titleFinder(html)
答案 0 :(得分:1)
在调用handle_data
之前调用handle_starttag
方法,此时没有设置intitle
属性。
只需将intitle = False
添加到您的班级:
class MyHTMLParser(HTMLParser):
intitle = False
# your methods
handle_data
为文档中的所有文本节点调用,包括空格,因此在handle_starttag
之前调用它并不常见。