BeautifulSoup - 提取数量

时间:2012-11-02 19:31:44

标签: python beautifulsoup

我正在尝试学习如何抓取网络,但是我的代码工作时遇到了一些问题。我想要从下面的代码中提取的数字是77.80。我遇到的问题是找到足够独特的东西来找到信息(地方)。你能用正确的代码帮助我吗?提前谢谢!

    </td>

            <td class="small">&nbsp;&nbsp;</td>

                    <td align="center" nowrap  valign="center" class="small">
                    <a alt="Utvald" class="small" href="javascript:QT('/se/skandia/funds/chosen.aspx?tab=5&cid=0P0000T35O&lang=SV&curiso=SEK&country=SE&clientattributes=8&lastpage=Sök fond&LastPageURL=/se/skandia/quickrank/index.aspx?tab=RSLTS|lang=SV|univ=SE1|country=SE|curiso=SEK|mec=|cat=-1|search=|sortby=Custom_4|sortorder=ASC|PageNo=1|Firstletter=','0P0000T35O','600')"  onmouseout="status=''; return true"><img src="../read/im/sigillsvartsmall_FFFFFF.gif" border="0" alt="Utvald av Skandia" height="12" width="9"/></a>
                </td>

            <td class="small">&nbsp;&nbsp;</td>

                <td align="right" nowrap  valign="top" class="small">
                    77.80                           
                </td>

            <td class="small">&nbsp;&nbsp;</td>

                <td align="right" nowrap  valign="top" class="small">
                    <!--<img src="../read/im/valueSEK.gif" align="texttop" height="10" width="22">-->
                    SEK
                </td>

            <td class="small">&nbsp;&nbsp;</td>

                <td align="right" nowrap  valign="top" class="small">
                    1.4
                </td>

            <td class="small">&nbsp;&nbsp;</td>

                <td align="right" nowrap  valign="top" class="small">
                    0.5
                </td>

            <td class="small">&nbsp;&nbsp;</td>

                <td align="right" nowrap  valign="top" class="small">
                    2.7
                </td>

            <td class="small">&nbsp;&nbsp;</td>

                <td align="right" nowrap  valign="top" class="small">
                    6.6
                </td>

1 个答案:

答案 0 :(得分:2)

以下是查找所需文字的方法。这只是查找具有tdclass='small'的第一个valign='top'

soup = BeautifulSoup(s)
tds = soup.find_all('td', attrs={'class': 'small', 'valign': 'top'})
the_td = tds[0].text.strip()