Question

我有一个我已经使用了好几年的剧本。网站上的一个特定页面加载并返回汤，但我的所有发现都没有返回结果。这是过去曾在此网站上运行的旧代码。我没有搜索特定的<div>，而是使用find或findAll将其简化为查找任何表，tr或td。我尝试了各种打开页面的方法，包括lxml - 都没有结果。

我的兴趣在于player_basic和player_records div

from BeautifulSoup import BeautifulSoup, NavigableString, Tag
import urllib2

url = "http://www.koreabaseball.com/Record/Player/HitterDetail/Basic.aspx?playerId=60456"

#html = urllib2.urlopen(url).read()
html = urllib2.urlopen(url,"lxml")
soup = BeautifulSoup(html)

#div = soup.find('div', {"class":"player_basic"})  
#div = soup.find('div', {"class":"player_records"})  
item = soup.findAll('td')  
print item

Answer 1

你没有阅读回复。 try this:

import urllib2

url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Basic.aspx?playerId=60456'
response = urllib2.urlopen(url, 'lxml')
html = response.read()

然后您可以将其与BeautifulSoup一起使用。如果它仍然不起作用，有充分的理由相信该页面中存在格式错误的HTML（缺少结束标记等），因为BeautifulSoup使用的解析器（特别是html.parser）不是很对此宽容。

更新：尝试使用lxml解析器：

soup = BeautifulSoup(html, 'lxml')
tds = soup.find_all('td')
print len(tds)
$ 142

beautifulsoup无法找到任何标签

1 个答案: