考虑以下BeautifulSoup代码段
>>> ================================ RESTART ================================
>>> import BeautifulSoup
>>> with open(r"d:\temp\ICE.htm") as ice:
soup = BeautifulSoup.BeautifulSoup(ice.read())
trs = soup.findAll('tr')
for tr in trs:
labels = tr.findAll('label', {'class':'PSEDITBOXLABEL'})
for label in labels:
print label.contents[0]
从IDLE运行它会导致它引发一个带有奇怪的回溯和奇怪的递归深度错误的RuntimeError异常
Traceback (most recent call last):
File "<pyshell#37>", line 4, in <module>
print label.contents[0]
File "C:\Python27\lib\idlelib\rpc.py", line 595, in __call__
value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
File "C:\Python27\lib\idlelib\rpc.py", line 210, in remotecall
seq = self.asynccall(oid, methodname, args, kwargs)
File "C:\Python27\lib\idlelib\rpc.py", line 225, in asynccall
self.putmessage((seq, request))
File "C:\Python27\lib\idlelib\rpc.py", line 324, in putmessage
s = pickle.dumps(message)
File "C:\Python27\lib\copy_reg.py", line 71, in _reduce_ex
state = base(self)
File "C:\Python27\lib\site-packages\BeautifulSoup.py", line 476, in __unicode__
return str(self).decode(DEFAULT_OUTPUT_ENCODING)
RuntimeError: maximum recursion depth exceeded while getting the str of an object
从命令提示符运行时,类似的代码运行正常
D:\temp>cat ICE.py
import BeautifulSoup
import os
import sys
with open(sys.argv[1]) as ice:
soup = BeautifulSoup.BeautifulSoup(ice.read())
trs = soup.findAll('tr')
for tr in trs:
labels = tr.findAll('label', {'class':'PSEDITBOXLABEL'})
for label in labels:
print label.contents[0]
D:\temp>python ICE.py ICE.htm
Report ID
*Subject
Date Created
Duplicate ID
*Reported By
Reporting Org
Incident ID
Traceback (most recent call last):
File "ICE.py", line 8, in <module>
labels = tr.findAll('label', {'class':'PSEDITBOXLABEL'})
File "c:\Python27\lib\site-packages\BeautifulSoup.py", line 849, in findAll
return self._findAll(name, attrs, text, limit, generator, **kwargs)
File "c:\Python27\lib\site-packages\BeautifulSoup.py", line 377, in _findAll
found = strainer.search(i)
File "c:\Python27\lib\site-packages\BeautifulSoup.py", line 970, in search
if self._matches(markup, self.text):
File "c:\Python27\lib\site-packages\BeautifulSoup.py", line 989, in _matches
if markup and not isinstance(markup, basestring):
KeyboardInterrupt
顺便说一句,我使用的BeautifulSoup版本是
>>> BeautifulSoup.__version__
'3.2.1'
我还没有更新到bs4,因为我发现它有些错误
注意可能无法共享html内容