给出以下html
代码段:
<div class="mapCopy">
<b>
<a href="someurl.com">
URL Text
</a>
</b>
<br/>
Address Line 1
<br/>
Address Line 2
<br/>
City, State, Zip
<p>
Phone: (123) 456-7890
<br/>
Fax: (123) 456-7890
</p>
</div>
如何仅提取地址行1,地址行2,城市,州和邮编?我相信我应该能够迭代div
并排除任何带有<b>
标记的元素,但我不确定必要的语法。
答案 0 :(得分:0)
您可以提取<div>
中不包含标签的所有子项:
>>> S = BeautifulSoup("<div...")
>>> [child.strip() for child in S.find('div').children
... if "<" not in str(child)
... and len(child) > 1
... ]
['Address Line 1', 'Address Line 2', 'City, State, Zip']