此代码不会将公司列表打印为reqiured。 它没有到达第一个标签内部 如果我在第一个标签内写入“print'text'”,则不会打印它。 BeautifulSoup正在为不同的网站编写不同的代码。 任何建议为什么它不起作用?
from bs4 import BeautifulSoup
import urllib
request = urllib.urlopen('http://www.stockmarketsreview.com/companies_sp500/')
html = request.read()
request.close()
soup = BeautifulSoup(html)
for tags in soup.find_all('div', {'class':'mainContent'}):
for row in tags.find_all('tr'):
for column in row.find_all('td'):
print column.text
答案 0 :(得分:0)
我有BeautifulSoup 3,这似乎工作正常:
import BeautifulSoup as BS
import urllib
request = urllib.urlopen('http://www.stockmarketsreview.com/companies_sp500/')
html = request.read()
request.close()
soup = BS.BeautifulSoup(html)
try:
tags = soup.findAll('div', attrs={'class':'mainContent'})
print '# tags = ' + str(len(tags))
for tag in tags:
try:
tables = tag.findAll('table')
print '# tables = ' + str(len(tables))
for table in tables:
try:
rows = tag.findAll('tr')
for row in rows:
try:
columns = row.findAll('td')
for column in columns:
print column.text
except:
e = 1
# print 'Caught error getting td tag under ' + str(row)
# This is okay since some rows have <th>, not <td>
except:
print 'Caught error getting tr tag under ' + str(table)
except:
print 'Caught error getting table tag under ' + str(tag)
except:
print 'Caught error getting div tag'
我相信您需要将'findAll'替换为'find_all'。
输出如下: