标签里面的beautifulsoup不工作

时间:2014-02-18 12:38:06

标签: python beautifulsoup urllib

此代码不会将公司列表打印为reqiured。 它没有到达第一个标签内部 如果我在第一个标签内写入“print'text'”,则不会打印它。 BeautifulSoup正在为不同的网站编写不同的代码。 任何建议为什么它不起作用?

from bs4 import BeautifulSoup
import urllib
request = urllib.urlopen('http://www.stockmarketsreview.com/companies_sp500/')
html = request.read()
request.close()
soup = BeautifulSoup(html)
for tags in soup.find_all('div', {'class':'mainContent'}):
    for row in tags.find_all('tr'):
        for column in row.find_all('td'):
            print column.text

1 个答案:

答案 0 :(得分:0)

我有BeautifulSoup 3,这似乎工作正常:

import BeautifulSoup as BS
import urllib
request = urllib.urlopen('http://www.stockmarketsreview.com/companies_sp500/')
html = request.read()
request.close()
soup = BS.BeautifulSoup(html)

try:
   tags = soup.findAll('div', attrs={'class':'mainContent'})
   print '# tags = ' + str(len(tags))
   for tag in tags:
      try:         
         tables = tag.findAll('table')
         print '# tables = ' + str(len(tables))
         for table in tables:            
            try:
               rows = tag.findAll('tr')
               for row in rows:
                  try:
                     columns = row.findAll('td')
                     for column in columns:
                        print column.text
                  except:
                     e = 1
                  #   print 'Caught error getting td tag under ' + str(row)
                  # This is okay since some rows have <th>, not <td>
            except:
               print 'Caught error getting tr tag under ' + str(table)
      except:
         print 'Caught error getting table tag under ' + str(tag)
except:
   print 'Caught error getting div tag'

我相信您需要将'findAll'替换为'find_all'。

输出如下: enter image description here