使用beautifulsoup难度检索信息

时间:2018-04-28 15:02:17

标签: python beautifulsoup

我是beautifulsoup的新手,我想检索标签内的特定元素,但问题是没有办法识别标签。

以下是html元素

<div class="tbl_racing_head" >
		<table class="tblgrey">
			<thead>
			<tr>
			  <th width="65%" class="aln_left"><a name="1"></a>Race 1. THE ZILLAH CUP.<span class="pull-right"><a href="/index.php/en/racing/results?view=full#top" id="back-top">Back to Top</a></span></th>
			  <th width="10%">1365 m</th>
			  <th width="15%">Rating 25-0</th>
			  <th width="10%">12h45</th>
			</tr>
			</thead>
		</table>
		</div>

我想要检索值为1365的th,但我找不到获取值的方法。我猜我必须使用nextsibling或一些父方法,但我遇到了困难。以下是我尝试过的代码

url = 'http://www.mauritiusturfclub.com/index.php/en/racing/results? 
   meeting='+str(race_played)+'-'+str(2012)+'&view=full'
   source_code = requests.get(url)
   plain_text = source_code.text
   soup = BeautifulSoup(plain_text,'html.parser')
   print('Track '+soup.findAll('th',{'width':'10%'})[3])

我遇到的错误似乎没有用,有人能解释我发生了什么吗?谢谢

<div class="tbl_racing_head" >
		<table class="tblgrey">
			<thead>
			<tr>
			  <th width="65%" class="aln_left"><a name="1"></a>Race 1. THE ZILLAH CUP.<span class="pull-right"><a href="/index.php/en/racing/results?view=full#top" id="back-top">Back to Top</a></span></th>
			  <th width="10%">1365 m</th>
			  <th width="15%">Rating 25-0</th>
			  <th width="10%">12h45</th>
			</tr>
			</thead>
		</table>
		</div>

		<table class="tblgrey">
			<thead>
			<tr>
				<th class="txt_center">Rank</th>
				<th class="txt_center">#</th>
				<th class="txt_center">Horse</th>
				<th class="txt_center">Stable</th>
				<th class="txt_center">Jockey</th>
				<th class="txt_center">Time</th>
								<th class="txt_center">Prize</th>
			</tr>
			</thead>
			<tbody>
							<tr>
					<td class="txt_center">1</td>
					<td class="txt_center">9</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >POLE OF COLD</a></td>
					<td class="txt_left">GUJADHUR</td>
					<td class="txt_left">V.Sola</td>
					<td class="txt_center">1m23.80</td>
										<td class="txt_center">115000</td>
				</tr>
							<tr>
					<td class="txt_center">2</td>
					<td class="txt_center">8</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >ROMAN SPLENDOUR</a></td>
					<td class="txt_left">R.GUJADHUR</td>
					<td class="txt_left">J.Bardottier</td>
					<td class="txt_center">1m23.94</td>
										<td class="txt_center">38000</td>
				</tr>
							<tr>
					<td class="txt_center">3</td>
					<td class="txt_center">6</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >ADDITION</a></td>
					<td class="txt_left">MAIGROT</td>
					<td class="txt_left">R.Hoolash</td>
					<td class="txt_center">1m24.18</td>
										<td class="txt_center">20000</td>
				</tr>
							<tr>
					<td class="txt_center">4</td>
					<td class="txt_center">5</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >TANGERINE</a></td>
					<td class="txt_left">S.RAMDIN</td>
					<td class="txt_left">N.Marday</td>
					<td class="txt_center">1m24.68</td>
										<td class="txt_center">14000</td>
				</tr>
							<tr>
					<td class="txt_center">5</td>
					<td class="txt_center">3</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >JUST OPPOSITE</a></td>
					<td class="txt_left">ALLET</td>
					<td class="txt_left">S.Bhundoo</td>
					<td class="txt_center">1m24.82</td>
										<td class="txt_center">8000</td>
				</tr>
							<tr>
					<td class="txt_center">6</td>
					<td class="txt_center">10</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >PORT ALBERT</a></td>
					<td class="txt_left">C.RAMDIN</td>
					<td class="txt_left">S.Bussunt</td>
					<td class="txt_center">1m24.87</td>
										<td class="txt_center">0</td>
				</tr>
							<tr>
					<td class="txt_center">7</td>
					<td class="txt_center">4</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >PACMAN</a></td>
					<td class="txt_left">S.HENRY</td>
					<td class="txt_left">B.Bhaugeerothee</td>
					<td class="txt_center">1m25.01</td>
										<td class="txt_center">0</td>
				</tr>
							<tr>
					<td class="txt_center">8</td>
					<td class="txt_center">2</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >JUST MODERN</a></td>
					<td class="txt_left">G.ROUSSET</td>
					<td class="txt_left">N.Teeha</td>
					<td class="txt_center">1m25.38</td>
										<td class="txt_center">0</td>
				</tr>
							<tr>
					<td class="txt_center">9</td>
					<td class="txt_center">1</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >DREAMS COME TRUE</a></td>
					<td class="txt_left">R.MAINGARD</td>
					<td class="txt_left">K.Ghunowa</td>
					<td class="txt_center">1m25.52</td>
										<td class="txt_center">0</td>
				</tr>
							<tr>
					<td class="txt_center">-</td>
					<td class="txt_center">7</td>
					<td class="txt_left"><a  href="/index.php/en/component/mtc_horse_rating_list/?view=horse" >CARAMEL KING</a></td>
					<td class="txt_left">P.MERVEN</td>
					<td class="txt_left">S.Rama</td>
					<td class="txt_center">-</td>
										<td class="txt_center">0</td>
				</tr>
						</tbody>
		</table>

1 个答案:

答案 0 :(得分:0)

尝试

s = """<div class="tbl_racing_head" >
        <table class="tblgrey">
            <thead>
            <tr>
              <th width="65%" class="aln_left"><a name="1"></a>Race 1. THE ZILLAH CUP.<span class="pull-right"><a href="/index.php/en/racing/results?view=full#top" id="back-top">Back to Top</a></span></th>
              <th width="10%">1365 m</th>
              <th width="15%">Rating 25-0</th>
              <th width="10%">12h45</th>
            </tr>
            </thead>
        </table>
        </div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(s,'html.parser')
for tr in soup.findAll("tr"):
    print(tr.find("th", {'width':'10%'}).text)