Python BeautifulSoup找到next_sibling

时间:2018-02-14 05:55:21

标签: python web-scraping beautifulsoup

我有一些美丽的汤的HTML代码问题。我无法弄清楚如何通过整个HTML文档来找到我正在寻找的其他东西。

我有这个代码,可以在下面的html中找到并打印“Totem”这个词。我希望能够遍历html并找到剩余的“一,二,三”和“租”

用于查找第一个标记和文本的代码:

print(html.find('td', {'class': 'play'}).next_sibling.next_sibling.text)

让下面的样本html去抓:

<tr>
    <td class="play">

      <a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
        <audio class="as_audio_preview" src="https://shopify.audiosalad.com/"  >foo</audio>

    </td>
    **<td>Totem</td>**
    <!--<td>$0.99</td>-->
    <td class="buy">


  <tr>
    <td class="play">

      <a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
        <audio class="as_audio_preview" src="https://shopify.audiosalad.com/"  >foo</audio>

    </td>
    **<td>One, Two, Three</td>**
    <!--<td>$0.99</td>-->
    <td class="buy">


  <tr>
    <td class="play">

      <a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
        <audio class="as_audio_preview" src="https://shopify.audiosalad.com/"  >foo</audio>

    </td>
    **<td>Rent</td>**
    <!--<td>$0.99</td>-->
    <td class="buy">

2 个答案:

答案 0 :(得分:1)

试试这个。它应该为您提取您所追求的内容:

from bs4 import BeautifulSoup

soup = BeautifulSoup(content,"lxml")
for items in soup.find_all(class_="play"):
    data = items.find_next_sibling().text
    print(data)

或者,您也可以尝试这样:

for items in soup.find_all(class_="play"):
    data = items.find_next("td").text
    print(data)

输出:

Totem
One, Two, Three
Rent

答案 1 :(得分:0)

你必须迭代元素,如下所示:

for td in html.find_all('td', {'class': 'play'}):
    print(td.next_sibling.next_sibling.text)