我想用Python的Beautiful Soup从HTML页面中抓取信息,而我需要的所有信息都放在同一个名称标签中>如何区分我需要的每一个信息?
答案 0 :(得分:1)
结果将是有序的。您只需要取出结果,因为结果的顺序与html中的顺序相同
from bs4 import BeautifulSoup
html = """
<div class = "hAyfc">
<div class = "BgcNfc">pro </div>
<span class = "htlgb">
<div>
<span class = "htlgb">
codeA
</span>
</div>
</span>
</div>
<div class = "hAyfc">
<div class = "BgcNfc">pro </div>
<span class = "htlgb">
<div>
<span class = "htlgb">
codeB
</span>
</div>
</span>
</div>
"""
bs = BeautifulSoup(html,"lxml")
result = [e.text for e in bs.find_all("div",{"class":"hAyfc"})]
print(result)