<a id="filepos10190"></a>
<a id="filepos10190">
<font size="6" color="#002984"><b>abashed </b></font> <div width="9"><i>
<font color="green"> adj.</font></i></div> <div width="18"><font
color="chocolate"><b>VERBS </b></font></div> <div width="27"><font
color="gray">▪</font> <font color="darkslateblue"><b>be</b></font>, <font
color="darkslateblue"><b>look</b></font></div> <div width="18"><font
color="chocolate"><b>ADVERB </b></font></div> <div width="27"><font
color="gray">▪</font> <font color="darkslateblue"><b>a little</b></font>,
<font color="darkslateblue"><b>slightly</b></font>, <font
color="darkslateblue"><b>etc.</b></font></div> <div width="27"><font
color="gray">▪</font> <font color="darkslateblue"><b>suitably</b></font>
</div> <div width="36"><font color="lightgray">▪</font> <span><font
color="#595959">He glanced at Juliet accusingly and she looked suitably
<u>~</u>.</font></span></div>
</a>
这里有两个锚标记,一个没有任何内部标记,而另一个带有很多子标记。如果我只想要其中一个带标签的标签,该如何在抓取时将这两个标签分开?
答案 0 :(得分:1)
from bs4 import BeautifulSoup
content="""
<a id="filepos10190"></a>
<a id="filepos10190">
<font size="6" color="#002984"><b>abashed </b></font> <div width="9"><i>
<font color="green"> adj.</font></i></div> <div width="18"><font
color="chocolate"><b>VERBS </b></font></div> <div width="27"><font
color="gray">▪</font> <font color="darkslateblue"><b>be</b></font>, <font
color="darkslateblue"><b>look</b></font></div> <div width="18"><font
color="chocolate"><b>ADVERB </b></font></div> <div width="27"><font
color="gray">▪</font> <font color="darkslateblue"><b>a little</b></font>,
<font color="darkslateblue"><b>slightly</b></font>, <font
color="darkslateblue"><b>etc.</b></font></div> <div width="27"><font
color="gray">▪</font> <font color="darkslateblue"><b>suitably</b></font>
</div> <div width="36"><font color="lightgray">▪</font> <span><font
color="#595959">He glanced at Juliet accusingly and she looked suitably
<u>~</u>.</font></span></div>
</a>"""
soup = BeautifulSoup(content, 'html.parser')
tags = soup.find_all('a') # just to filter your desire tag in this case anchor tag
filtered_tag = [i for i in tags if list(i.children)] # results tags if it has child tags inside it otherwise empty list
答案 1 :(得分:1)
您实际上可以一次完成:
soup.find_all(lambda tag: tag.name == 'a' and tag.find())
tag.find()
会尝试在tag
中查找任何元素,而只有一个元素。