我在下面的代码中遇到了上述错误。错误发生在最后一行。请原谅主题,我只是练习我的蟒蛇技能。 =)
from urllib.request import urlopen
from bs4 import BeautifulSoup
from pprint import pprint
from pickle import dump
moves = dict()
moves0 = set()
url = 'http://www.marriland.com/pokedex/1-bulbasaur'
print(url)
# Open url
with urlopen(url) as usock:
# Get url data source
data = usock.read().decode("latin-1")
# Soupify
soup = BeautifulSoup(data)
# Find move tables
for div_class1 in soup.find_all('div', {'class': 'listing-container listing-container-table'}):
div_class2 = div_class1.find_all('div', {'class': 'listing-header'})
if len(div_class2) > 1:
header = div_class2[0].find_all(text=True)[1]
# Take only moves from Level Up, TM / HM, and Tutor
if header in ['Level Up', 'TM / HM', 'Tutor']:
# Get rows
for row in div_class1.find_all('tbody')[0].find_all('tr'):
# Get cells
cells = row.find_all('td')
# Get move name
move = cells[1].find_all(text=True)[0]
# If move is new
if not move in moves:
# Get type
typ = cells[2].find_all(text=True)[0]
# Get category
cat = cells[3].find_all(text=True)[0]
# Get power if not Status or Support
power = '--'
if cat != 'Status or Support':
try:
# not STAB
power = int(cells[4].find_all(text=True)[1].strip(' \t\r\n'))
except ValueError:
try:
# STAB
power = int(cells[4].find_all(text=True)[-2])
except ValueError:
# Moves like Return, Frustration, etc.
power = cells[4].find_all(text=True)[-2]
# Get accuracy
acc = cells[5].find_all(text=True)[0]
# Get pp
pp = cells[6].find_all(text=True)[0]
# Add move to dict
moves[move] = {'type': typ,
'cat': cat,
'power': power,
'acc': acc,
'pp': pp}
# Add move to pokemon's move set
moves0.add(move)
pprint(moves)
dump(moves, open('pkmn_moves.dump', 'wb'))
为了产生错误,我尽可能地减少了代码。错误可能很简单,但我不能找到它。与此同时,我通过将递归限制设置为10000来解决了这个问题。
答案 0 :(得分:10)
只想为可能遇到此问题的其他人提供答案。具体来说,我是从远程API在Django会话中缓存BeautifulSoup对象。
简短的回答是不支持酸洗BeautifulSoup节点。我改为选择将原始字符串数据存储在我的对象中,并且有一个可以动态解析它的访问器方法,这样只会对原始字符串数据进行pickle。