我正在解析一个HTML页面来自BeautifulSoup的结果,我感兴趣的部分看起来像这样:
<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departure for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="View details for: TEKIRDAG">TEKIRDAG</a> </td>
我有兴趣提取port_name
,TEKIRDAG,但有许多端口名称标记相同。我的问题是,如果它在字符串port_name
之后发生,则只能提取'Departure for'
吗?
答案 0 :(得分:1)
您可以找到文本节点并获取next sibling:
In [1]: from bs4 import BeautifulSoup
In [2]: data = """<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departu
...: re for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="Vie
...: w details for: TEKIRDAG">TEKIRDAG</a> </td>"""
...:
In [3]: soup = BeautifulSoup(data, "html.parser")
In [4]: soup.find(text="Departure for ").next_sibling.get_text()
Out[4]: u'TEKIRDAG'