使用BeautilfulSoup在python中解析HTML片段

时间:2018-11-28 12:47:36

标签: beautifulsoup html-parsing

我需要使用BeautifulSoup解析此HTML字符串。字符串是

<address><span rel="v:address"><span dir="ltr"><span class="street-address" property="v:street-address">5015 Campbell Blvd</span>, <span class="locality"><span property="v:locality">Baltimore</span>, <span property="v:region">MD</span> <span property="v:postal-code">21236</span></span> </span></span></address>

我实际上是想在标签Baltimore中获取值<span property="v:locality">

但是以某种方式,当我运行以下代码时,我最多只能访问<span class="street-address" property="v:street-address">。如何获取值是标签<span property="v:locality">

以下是我的代码。

from bs4 import BeautifulSoup
str = <address><span rel="v:address"><span dir="ltr"><span class="street-address" property="v:street-address">5015 Campbell Blvd</span>, <span class="locality"><span property="v:locality">Baltimore</span>, <span property="v:region">MD</span> <span property="v:postal-code">21236</span></span> </span></span></address>
soup = BeautifulSoup(str)
print(soup.address.span.span.find_all('property'))

输出为

[]

2 个答案:

答案 0 :(得分:1)

[https://codepen.io/zoom/pen/NEObQB][2]

答案 1 :(得分:0)

>>> from bs4 import BeautifulSoup
>>> html = '''<address><span rel="v:address"><span dir="ltr"><span class="street-address" property="v:street-address">5015 Campbell Blvd</span>, <span class="locality"><span property="v:locality">Baltimore</span>, <span property="v:region">MD</span> <span property="v:postal-code">21236</span></span> </span></span></address>'''
>>> soup = BeautifulSoup(html, "lxml")
>>> target = soup.find_all('span', attrs={'property': 'v:locality'})
>>> for value in target:
        print(value.text)

Baltimore