python2.6 + htmllib0.99 + bs4
运行以下代码时会抛出异常
#!/usr/bin/python
# -------_*_ coding: utf-8 _*_
from bs4 import BeautifulSoup
import html5lib
html = '''
<html>
<head>
<title> test
</title>
</head>
<body>
<div id="tcp">hello</div>
</body>
</html>
'''
cs = BeautifulSoup(html,"html5lib")
print cs.contents[0].contents[2].contents[1]['id']
main_tag = cs.find('div', id='tcp')
print main_tag.text
####result####
#tcp
#Traceback (most recent call last):
# File "C:\Users\XXXXXXXX\Desktop\test.py", line 21, in <
# print main_tag.text
#AttributeError: 'NoneType' object has no attribute 'text'
删除&#34;&lt; title&gt;&#34;之间的空格后和&#34;测试&#34; ,程序将成功运行
答案 0 :(得分:0)
这是bs4中的已知错误。参见:
https://bugs.launchpad.net/beautifulsoup/+bug/1430633
在某些情况下,bs4会生成格式错误的树。然后&#34;找到&#34;从树的末尾开始运行,并返回None。