Question

我有一个html片段，如下所示：

<div class="single_baby_name_description">
    <label>Meaning :</label> <span class="28816-meaning">the meaning of this name is universal whole.</span> </br>
    <label>Gender :</label> <span class="28816-gender">Girl</span> </br>
    <label>Religion :</label> <span class="28816-religion">Christianity</span> </br>
    <label>Origin :</label> <span class="28816-origin">German,French,Swedish</span> </br>
</div>

我尝试使用

从div中的所有范围中提取文本

soup = BeautifulSoup(html,'html.parser')
spans=soup.select('div.single_baby_name_description>span')

但是spans [0] .text仅从第一个标签获取文本。并且spans [1] .text发生IndexError：列表索引超出范围。

任何帮助将不胜感激。

Answer 1

我发现只有'lxml'可以胜任。由于某种原因，“ html.parser”不会。

这将起作用：

soup = BeautifulSoup(html, 'lxml')
spans = soup.select('div.single_baby_name_description span')
spans = [span.text for span in spans]
print(spans)

输出：

['the meaning of this name is universal whole.', 'Girl', 'Christianity', 'German,French,Swedish']

Answer 2

看美丽的汤文档

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#beautifulsoup

通过标签名称选择属性只会返回您所描述的第一个属性。您是否尝试过：

Soup.find_all(‘span’)

如何使用beautifulsoup从div包围的范围中提取文本

2 个答案: