从IDLE运行时,BeautifulSoup因递归深度错误而失败

时间:2013-02-14 09:00:54

标签: python python-2.7 beautifulsoup

考虑以下BeautifulSoup代码段

>>> ================================ RESTART ================================
>>> import BeautifulSoup
>>> with open(r"d:\temp\ICE.htm") as ice:
    soup = BeautifulSoup.BeautifulSoup(ice.read())
    trs = soup.findAll('tr')
    for tr in trs:
        labels = tr.findAll('label', {'class':'PSEDITBOXLABEL'})
        for label in labels:
        print label.contents[0]

从IDLE运行它会导致它引发一个带有奇怪的回溯和奇怪的递归深度错误的RuntimeError异常

Traceback (most recent call last):
  File "<pyshell#37>", line 4, in <module>
    print label.contents[0]
  File "C:\Python27\lib\idlelib\rpc.py", line 595, in __call__
    value = self.sockio.remotecall(self.oid, self.name, args, kwargs)
  File "C:\Python27\lib\idlelib\rpc.py", line 210, in remotecall
    seq = self.asynccall(oid, methodname, args, kwargs)
  File "C:\Python27\lib\idlelib\rpc.py", line 225, in asynccall
    self.putmessage((seq, request))
  File "C:\Python27\lib\idlelib\rpc.py", line 324, in putmessage
    s = pickle.dumps(message)
  File "C:\Python27\lib\copy_reg.py", line 71, in _reduce_ex
    state = base(self)
  File "C:\Python27\lib\site-packages\BeautifulSoup.py", line 476, in __unicode__
    return str(self).decode(DEFAULT_OUTPUT_ENCODING)
RuntimeError: maximum recursion depth exceeded while getting the str of an object

从命令提示符运行时,类似的代码运行正常

D:\temp>cat ICE.py
import BeautifulSoup
import os
import sys
with open(sys.argv[1]) as ice:
        soup = BeautifulSoup.BeautifulSoup(ice.read())
        trs = soup.findAll('tr')
        for tr in trs:
            labels = tr.findAll('label', {'class':'PSEDITBOXLABEL'})
            for label in labels:
                print label.contents[0]

D:\temp>python ICE.py ICE.htm
Report ID
*Subject
Date Created
Duplicate ID
*Reported By
Reporting Org
Incident ID
Traceback (most recent call last):
  File "ICE.py", line 8, in <module>
    labels = tr.findAll('label', {'class':'PSEDITBOXLABEL'})
  File "c:\Python27\lib\site-packages\BeautifulSoup.py", line 849, in findAll
    return self._findAll(name, attrs, text, limit, generator, **kwargs)
  File "c:\Python27\lib\site-packages\BeautifulSoup.py", line 377, in _findAll
    found = strainer.search(i)
  File "c:\Python27\lib\site-packages\BeautifulSoup.py", line 970, in search
    if self._matches(markup, self.text):
  File "c:\Python27\lib\site-packages\BeautifulSoup.py", line 989, in _matches
    if markup and not isinstance(markup, basestring):
KeyboardInterrupt

顺便说一句,我使用的BeautifulSoup版本是

>>> BeautifulSoup.__version__
'3.2.1'

我还没有更新到bs4,因为我发现它有些错误

注意可能无法共享html内容

0 个答案:

没有答案