我有一些美丽的汤的HTML代码问题。我无法弄清楚如何通过整个HTML文档来找到我正在寻找的其他东西。
我有这个代码,可以在下面的html中找到并打印“Totem”这个词。我希望能够遍历html并找到剩余的“一,二,三”和“租”
用于查找第一个标记和文本的代码:
print(html.find('td', {'class': 'play'}).next_sibling.next_sibling.text)
让下面的样本html去抓:
<tr>
<td class="play">
<a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
<audio class="as_audio_preview" src="https://shopify.audiosalad.com/" >foo</audio>
</td>
**<td>Totem</td>**
<!--<td>$0.99</td>-->
<td class="buy">
<tr>
<td class="play">
<a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
<audio class="as_audio_preview" src="https://shopify.audiosalad.com/" >foo</audio>
</td>
**<td>One, Two, Three</td>**
<!--<td>$0.99</td>-->
<td class="buy">
<tr>
<td class="play">
<a href="#" class="audio-preview"><span class="play-button as_audio-button"></span></a>
<audio class="as_audio_preview" src="https://shopify.audiosalad.com/" >foo</audio>
</td>
**<td>Rent</td>**
<!--<td>$0.99</td>-->
<td class="buy">
答案 0 :(得分:1)
试试这个。它应该为您提取您所追求的内容:
from bs4 import BeautifulSoup
soup = BeautifulSoup(content,"lxml")
for items in soup.find_all(class_="play"):
data = items.find_next_sibling().text
print(data)
或者,您也可以尝试这样:
for items in soup.find_all(class_="play"):
data = items.find_next("td").text
print(data)
输出:
Totem
One, Two, Three
Rent
答案 1 :(得分:0)
你必须迭代元素,如下所示:
for td in html.find_all('td', {'class': 'play'}):
print(td.next_sibling.next_sibling.text)