无法从HTML代码中提取表格

时间:2014-03-27 17:37:08

标签: python beautifulsoup html-table html-parser

我正在解析下面给出的html表(它是一段完整的html代码)但是代码不起作用。有人可以帮助我。有一个错误说“表没有属性findall”。 代码是:

import re
import HTMLParser
from urllib2 import urlopen
import urllib2
from bs4 import BeautifulSoup

url = 'http://164.100.47.132/LssNew/Members/Biography.aspx?mpsno=4064'


url_data = urlopen(url).read()
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
title = soup.title
final_tit = title.string

table = soup.find('table',id = "ctl00_ContPlaceHolderMain_Bioprofile1_Datagrid1")

tr = table.findall('tr')
for tr in table:
  cols = tr.findAll('td')
  for td in cols:
      text = ''.join(td.find(text=True))
      print text+"|",
  print



<table style="WIDTH: 565px">
        <tr>
            <td vAlign="top" align="left"><img id="ctl00_ContPlaceHolderMain_Bioprofile1_Image1" src="http://164.100.47.132/mpimage/photo/4064.jpg" style="height:140px;border-width:0px;" /></td>
            <td vAlign="top"><table cellspacing="0" rules="all" border="2" id="ctl00_ContPlaceHolderMain_Bioprofile1_Datagrid1" style="border-color:#FAE3C3;border-width:2px;border-style:Solid;width:433px;border-collapse:collapse;">
        <tr>
            <td>
                                <table align="center" height="30px">
                                    <tr valign="top">
                                        <td align="center" valign="top" class="gridheader1">Aaroon Rasheed,Shri J.M.</td>
                                    </tr>
                                </table>
                                <table height="110px">
                                    <tr>
                                        <td align="left" class="darkerb" width="133px" valign="top">Constituency&nbsp;&nbsp;&nbsp;:</td>
                                        <td align="left" valign="top" class="griditem2" width="300px">Theni      (Tamil Nadu                                                  )</td>
                                    </tr>
                                    <tr>
                                        <td align="left" width="133px" class="darkerb" valign="top">
                                            Party Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;:</td>
                                        <td align="left" width="300px" valign="top" class="griditem2">Indian National Congress(INC)</td>
                                    </tr>
                                    <tr>
                                        <td align="left" class="darkerb" valign="top" width="133px">
                                            Email Address :
                                        </td>
                                        <td align="left" valign="top" class="griditem2" width="300px">jm.aaronrasheed@sansad.nic.in</td>
                                    </tr>
                                </table>
                            </td>
        </tr>
    </table></td>
        </tr>
    </table>

1 个答案:

答案 0 :(得分:0)

该方法称为find_all(),而不是findall

tr = table.find_all('tr')