如何在与BeautifulSoup相邻时提取某些字符串

时间:2016-09-28 12:50:38

标签: python html beautifulsoup

我正在解析一个HTML页面来自BeautifulSoup的结果,我感兴趣的部分看起来像这样:

<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departure for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="View details for: TEKIRDAG">TEKIRDAG</a> </td>

我有兴趣提取port_name,TEKIRDAG,但有许多端口名称标记相同。我的问题是,如果它在字符串port_name之后发生,则只能提取'Departure for'吗?

1 个答案:

答案 0 :(得分:1)

您可以找到文本节点并获取next sibling

In [1]: from bs4 import BeautifulSoup

In [2]: data = """<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departu
   ...: re for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="Vie
   ...: w details for: TEKIRDAG">TEKIRDAG</a> </td>"""
   ...:     

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: soup.find(text="Departure for ").next_sibling.get_text()
Out[4]: u'TEKIRDAG'