Python,Beautiful Soup:如何获得所需的元素

时间:2015-04-22 15:33:22

标签: python beautifulsoup

我正在尝试到达某个元素,解析网站的源代码。 这是我试图解析的部分的片段(这里直到星期五),但是对于一周中的所有日子它都是相同的

objExcel.Cells(rowNum + 2, 6).Value = "SUB TOTAL"
objExcel.Cells(rowNum + 2, 8).Value ="=SUM(H7:H"&finalRowNum&")"                'Extended Cost subtotal'
objExcel.Cells(rowNum + 2, 9).Value ="=SUM(I7:I"&finalRowNum&")"                'low price subtotal'
objExcel.Cells(rowNum + 2, 10).Value ="=SUM(J7:J"&finalRowNum&")"           'list price subtotal'
objExcel.Cells(rowNum + 2, 11).Value = "=H"&finalRowNum + 1&"*L"&finalRowNum + 1        'price quote'       'for included the markup going on the subtotal for all quoted items
objExcel.Cells(rowNum + 2, 12).Value ="2.00"
objExcel.Cells(rowNum + 2, 12).Interior.Color = RGB(255, 255, 153)

....等等所有日子

实际上我得到了我的结果,但我认为这是一种丑陋的方式:

<div id="intForecast">
    <h2>Forecast for Rome</h2>
    <table cellspacing="0" cellpadding="0" id="nonCA">
        <tr>
            <td onclick="showDetails('1');return false" id="day1" class="on">
                <span>Thursday</span>
                <div class="intIcon"><img src="http://icons.wunderground.com/graphics/conds/2005/sunny.gif" alt="sunny" /></div>
                <div>Clear</div>
                <div><span class="hi">H <span>22</span>&deg;</span> / <span class="lo">L <span>11</span>&deg;</span></div>
            </td>
            <td onclick="showDetails('2');return false" id="day2" class="off">
                <span>Friday</span>
                <div class="intIcon"><img src="http://icons.wunderground.com/graphics/conds/2005/partlycloudy.gif" alt="partlycloudy" /></div>
                <div>Partly Cloudy</div>
                <div><span class="hi">H <span>21</span>&deg;</span> / <span class="lo">L <span>15</span>&deg;</span></div>
            </td>
        </tr>
    </table>
</div>

现在,您可以看到我深入了解重复forecastFriday= soup.find('div',text='Friday').findNext('div').findNext('div').string 的元素并最终到达.findNext('div')

我希望得到星期五的“部分多云”信息

那么更多的pythonic方式呢? 谢谢!

1 个答案:

答案 0 :(得分:0)

只需找到所有<td>并迭代它们:

soup = BeautifulSoup(your_html)
div = soup('div',{'id':'intForecast'})[0]
tds = div.find('table').findAll('td')

for td in tds:
    day = td('span')[0].text
    forecast = td('div')[1].text
    print day, forecast